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REPUTATION DIFFERENCES AMONG YOUNG 
SCHOOL CHILDREN 


J. W. MACFARLANE, M. P. HONZIK AND M. H. DAVIS* 
Institute of Child Welfare, University of California 


An important aspect of a child’s psycho-social environment is his 
reputation among his classmates. A rating procedure for measuring 
this aspect of the child’s environment was devised by Hartshorne 
and May’ in 1929. Their method, which they have called the “‘Guess 
Who” test, consists of presenting brief character-sketches or descrip- 
tions to children, and asking them to ‘‘guess’”’ who the children are 
that fit these descriptions. A child’s reputation-score is then deter- 
mined by the number of mentions he receives from his classmates. 
Hartshorne and May? report a reliability coefficient of .95 for a battery 
of ‘Guess Who”’ items administered to children in the fifth and sixth 
grades. Symonds® gave this type of questionnaire to five hundred 
eighty-five, seventh-, eighth- and ninth-grade children; and found the 
reliability of the various items or character-sketches to vary from 
—.26 to .97. Tryon,* using a similar type of test, obtained an 
average reliability coefficient of the order of .70 for a group of approxi- 
mately three hundred sixty, eleven- and twelve-year-old children. 
Reliability coefficients for single items were as high as .95. Symonds 
ound in his study that disciplinary problems are predicted better 
by the ‘‘Guess Who”’ questionnaire than by an Adjustment Question- 
naire which the children answered themselves and which stressed 
personal rather than social adjustment. It seems clear that the 
‘Guess Who” type of questionnaire has a fair degree of reliability 





* The writers are indebted to Dr. Virgil Dickson for assistance in procuring data 
n the Berkeley Public Schools; to the principals and teachers of the elementary 
hools for their generous codéperation; to Mr. J. Delaney for able assistance in the 
atistical analysis of the data; and to statistical clerks supplied through W.P.A. 
Project No. 4428). 
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with fifth- to tenth-grade children, some prognostic value with respect 
to problem behavior, and general psychological significance. 


THE PRESENT STUDY 


The purpose of the present study is to determine the usefulness 
of the ‘‘Guess Who”’ technique as a measure of reputation differences 
with younger children in the first three school grades. In order to 
adapt this technique to younger children, certain changes in the types 
of items used and the method of presenting them had to be made. 
Items which had yielded interesting results in the study made by 
Tryon were selected, and revised on the basis of preliminary experi- 
mentation with seven- and eight-year-old children. The items finally 
selected and used are listed here. The names given the items are 
capitalized, and the questions actually asked the child follow. The 
plus and minus designations refer to the positive (or complimentary) 
and to the negative (or uncomplimentary) aspects of each pair of 
items. * 


‘GUESS WHO” QUESTIONNAIRE 


1. (—) Wiaaty: “ Which children wiggle a lot and can’t sit still?”’ 
vs. 
2. (+) Quret: ‘ Which children sit very still and quiet?”’ 


3. (+) Popuar: “Who are the ones everyone likes?”’ 
v8. 
4. (—) Nor Many Frienps: “‘ Who are the ones nobody likes very much?” 


5. (+) Smruzs FReQquENTLY: “Which children are always smiling and 
laughing?” 
vs. 
6. (—) Serious, Unsmruine: “Which children don’t smile very much and 
seem sort of sad?” 


7. (—) Quarretsome: ‘ What children quarrel a lot?” 
vs. 
8. (+) Nor Quarretsome: “What children hate to quarrel?” 


9. (—) Scarep Easrty: “Which children get scared of everything—are 
‘fraidy cats’?”’ 
vs. 
10. (+) Nor Scarzp: “Who are the bravest and almost never get scared?’’ 





* Judgments as to whether the items were complimentary or uncomplimentary 
in nature were made arbitrarily by Macfarlane and Davis. 
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11. (—) Bossy: “‘ What children are bossy?” 
v8 


12. (+) Nor Bossy: ‘‘ Which children let other children boss them?”’ 


13. (—) Poor Sport: “ Which children are poor sports?” 
v8. 
14. (+) Goon Sport: ‘‘ Who are good sports?” 


15. (—) Basnrot: ‘‘ Which children are the most bashful?” 
v8 


16. (+) Nor Basurut: “ Which children aren’t the least bit bashful?’”’ 


17. (+) Goop at Games: ‘‘ Which children are the best at outdoor games?”’ 
vs 


18. (—) Nor Goop at Gamgs: “‘ Which children aren’t very good at games?”’ 


19. (—) Gets Map Eastty: “ Which children get mad the easiest?”’ 
vs 


20. (+) Doxsn’t Get Map Eastty: “ Which ones don’t get mad much?” 


21. (—) Srssy: ‘‘ What boys are the worst sissies?”’ 
v8. 
22. (+) Reat Boy: ‘‘ Who are the real boys—the regular fellows?” 


23. (—) Tomsoy: ‘‘ What girls are tomboys?” 
vs. 
24. (+) Acts Like a Litrie Lapy: “‘ What girls act like little ladies?”’ 


25. (+) Best Frrenp: “ Who’s your best friend?”’ 


The adaptation in method of presenting ‘‘Guess Who” items to 
these young children consisted in interviewing each child, rather 
than administering the questionnaire as a group test. The children 
were taken individually from the classroom and told: 


“T want to see how good you are at guessing the names of the children in your 
room. I'll tell you just what these children do and you tell me who I’m talking 
about. It may be one person or twoorthree. Sometimes it might even be you.” 


The questions were asked in a matter-of-fact manner and the 
responses recorded by a system of checks rather than writing down 
the names of the classmates mentioned. The time taken by the chil- 
dren to answer the questions varied from about six to twenty minutes. 

No emotional disturbance was caused by asking children their 
opinions of one another. It appears that the responses which the 
children make are simply their formed opinions, which they are (at 
this age level) quite willing to report to an impartial adult. However, 
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since the principals and teachers were concerned when it was proposed 
that the children be asked, even indirectly, about their classmates, it 
was necessary to explain to each one that (1) no evidence of emotional 
disturbance had been noted in the children of schools previously 
investigated, and (2) that other schools had found the results of value 
in helping the children in their social adjustments. 


SAMPLE AND DATA 


All the children in the first, second, and third grades of four differ- 
ent elementary schools in Berkeley, California,* were interviewed by 
Mrs. Davis. 

There were a total of two hundred fourteen boys and one hundred 
sixty-two girls in these twelve classrooms. The number of children 
in each class ranged from twenty-six to forty-two, and the age range of 
these children was approximately six to ten years. 

The scores used in this study are either (1) the total number of 
mentions on the positive or negative items or (2) the algebraic sum 
of the negative and positive mentions in those instances where a score 
for the pair of items was wanted. It was always, of course, necessary 
to equate for size of class by considering the percentage of the class 
who mentioned each child on a given item. 


AGREEMENT AMONG THE CHILDREN IN THEIR JUDGMENTS 
OF EACH OTHER 


A rather simple but convenient method of estimating the reliability 
of the children’s judgments was used. Since the pairs of items pre- 
sumably represent extremes of behavior, we should expect that an 
unreliable pair of items would be one on which the same children were 
mentioned on the opposite ends of the scale. We find that although 
the children tend to agree in judging that their classmates belong at 
one end of the scale or the other, there is more disagreement or a larger 
proportion of contradictory votes among these young children than 
occurred among the adolescent children studied by Tryon.® 





* The classrooms selected for study were those which included children whose 
personality development had been followed for a period of seven years. Other 
methods of evaluating or measuring the personality adjustments of the children 
in the developmental study have included interviews with the parents, their 
teachers, and the children themselves. For more details of this more comprehen- 
sive developmental study, see Macfarlane, J. W.: Studies in child guidance: I. 
Method of data collection and organization. This is to be published as a monograph 
of the Society for Research in Child Development. 
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Reputation Differences among Children 


On the first pair of items, Wiggly-Quiet, there was some disagree- 
ment in rating forty-three per cent of the three hundred seventy- 
six children, and perfect agreement in rating fifty-seven per cent of the 


Taste I.—TxzacueEr’s OPINION AND SELF-ESTIMATES IN RELATION TO THE CLASS 
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8. Not quarrelsome................ 44+ .06) .02 + .05 
Css ansecscc cas ecueeess 64 .47+ .06| .65 + .06 
By MEG VacceSbceceksebees cen .25 + .06; .08 + .05 
i PS Pe ee es ees, ee 57 .60 + .05; .28 + .09 
UD. FROG RONG si. ois cccccvccvewss se 09 + .07| .04 + .05 
Rc connads bie oy thadvabes 4 52 .49 + .07| .33 + .08 
——— .41 + 06 —.02 + .05 
LG). 90 0hennbs Haan 64 .09 + .07; .15 + .06 
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17. Good at GAMES. ................. 54 .65 + .04; .09 + .04 
18. Not good at games............... .67 + .05| .35 + .06 
19. Gets mad easily................. 52 .833 + .07| .06 + .07 
20. Doesn’t get mad easily........... 41 + .05| .01 + .05 
AEG Page a ape 34 —.11 + .09) —.15 + .15 
Coles mséecs dens 44be .84 + .07| —.07 + .06 
Se AN dks cSwEGd Us UH eN es. 35 .60 + .09;} .21 + .08 
24. Acts like a little lady............. } 44+ .09; .14 + .07 











group (see Table I). 


By perfect agreement is meant the absence of 


any contradictory or dissenting votes with respect to a given child’s 
reputation of being Wiggly, Quiet, or not especially Wiggly or Quiet.* 





nature and extent of the discrepancies are not shown. 


*This method of indicating the reliability has the disadvantage that the 


It was noted, however, 


that large discrepancies tended to occur principally on those items for which there 
was disagreement in rating a fairly large percentage of the group. Split-half 
reliabilities will be reported for the different grades as more data are accumulated. 








- . 
POT 
zee 


— — 


FRET ee ene 


— SS! 





4 
i 

+ 
4 
* 
-m: 

- 
es 4 
J 
Ht 
* 
fuk 
th 

* 
F 





166 The Journal of Educational Psychology 


The highest agreement was obtained for the pair of items Popular- 
Not Many Friends. Here the classmates agreed perfectly in their 
ratings of sixty-nine per cent of the group. The lowest agreement 
occurred for the pair of items Sissy-Real Boy. There was perfect 
agreement in rating only thirty-four per cent of the boys. Another 
item on which there was low agreement was the pair of items Tomboy- 
Acts Like A Little Lady (perfect agreement in rating only thirty-five per 
cent of the children). 

Probably this criterion of agreement, “‘no dissenting votes” is 
too stringent. If we allow one dissenting or contradictory vote in the 
classification of agreement, we find the ‘“‘agreement”’ is increased, on 
the average, twenty-eight per cent. The agreement in rating children 
as Popular-Not Many Friends is thus increased from sixty-nine to 
ninety-one per cent, and for the pair of items Sissy-Real Boy from 
thirty-four to sixty-four per cent. 

This method of indicating the reliability of the items is not entirely 
satisfactory, but it does show that the children agreed in their rating 
of fifty-four per cent of the children on the average, and that there was 
disagreement of only one judgment in another twenty-eight per cent 
of the cases. Thus, there appears to be fair agreement in rating 
eighty-two per cent of the children, and disagreement of a greater 
magnitude than one vote in rating only eighteen per cent of the children 
on the various items. Although the reliability of these judgments is 
lower than that obtained on groups of adolescent children, it appears 
sufficiently high to warrant securing data of this kind in the first three 
grades by investigators interested in making personality studies of 
children at these ages. 


INDIVIDUAL AND SEX DIFFERENCES 


The distributions of the boys’ and girls’ scores for the various pairs 
of items are shown in Fig. 1. The possible range of scores is from 
—100 to +100, since the entire class (one hundred per cent) could 
mention a child on either the negative or positive aspects of each trait. 

For the pair of items Wiggly-Quiet, we find that the boys’ scores 
range from —58 to +50, and the girls’ scores from —57 to +71. The 
girls tend to have reputations of being quieter than the boys and the 
three children who obtained the largest percentage of votes for being 
Quiet are girls. 

The distribution of scores for the pair of items Not Many Friends- 
Popular is fairly similar for the two sexes. The most popular child 
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in the group is a girl who was voted Popular by sixty-eight per cent of 
the children in her class. 

DISTRIBUTIONS OF ‘GUESS WHO” SCORES 
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Apparently there are few children in the first three grades who 
have the reputation of being Serious or Unsmiling. The children in 
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the group who are most outstanding with respect to Smiling Frequently 
are girls with scores of forty-eight and fifty-two respectively. 

The pair of items showing the most marked sex difference is Quarrel- 
some-Not Quarrelsome. As might be expected, the boys are more 
quarrelsome than the girls, and the children with the greatest reputa- 
tion for being Not Quarrelsome are girls. 

A very slight but interesting sex difference is observable for the 
pair of items Easily Scared-Not Scared. The eight children out- 
standing for being Not Scared are boys, while the three children most 
Easily Scared are girls. 

The distributions of scores for Bossy-Not Bossy are slightly skewed, 
there being more children who are Not Bossy. The extreme scores 
with respect both to being Bossy and Not Bossy are received by girls. 

The most outstanding Good Sport is a boy; while the child with the 
reputation of being the Poorest Sport is a girl. 

There is a slight tendency for the girls to be more bashful than the 
boys. The children with the highest scores for Bashful are girls and 
the children earning the highest scores for Not Bashful are boys. 

Outstanding reputations of being Good At Games are found among 
the boys, while the highest scores for being Not Good At Games are 
earned by the girls. 

There appear to be more Real Boys than Sissies, judging by the 
slightly skewed distribution. The girls’ distribution of Tomboy-Acts 
Like A Little Lady is more symmetrically distributed, there being girls 
with definite reputations of being Tomboys and others with equally 
clear-cut reputations of being Little Ladies. 

Summarizing these sex differences, the boys tend to be more Wiggly 
and more Quarrelsome than the girls; they are not as apt to be Scared 
or Bashful; and they are better Sports and better at Games than the 


girls. 
INTERCORRELATIONS OF SCORES 


The intercorrelations of the scores computed for boys and girls 
separately, range from +.02 + .05 to +.81 + .02. The average of 
all the intercorrelations equals .52 for the girls and .44 for the boys, 
suggesting that there is more “halo” effect in the reputations of the 
girls than in those of the boys. Although probably “‘halo”’ effect is 
partially responsible for the magnitude of these intercorrelations, it 
is also possible that the more desirable traits do tend to be found in 
certain children. 











TaBLE II.—INTERCORRELATIONS OF ‘‘GuEss WuHoO”’ ScorREs 
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The pair of items showing the highest correlations with all the other 
items is Popular-Not Many Friends. The average of the correlations 
of this pair of items with all the others is .64 for the girls, and .55 for 
the boys. In other words, this pair of items best measures that which 
is measured by the total battery of thirteen pairs of items in the 
questionnaire. The items showing the highest correlations with 
the popularity item are these: 

The Popular girl is primarily a Good Sport (r = .81 + .02); she is 
Not Quarrelsome (r = .77 + .02); she is Quiet (r = .75 + .02); and she 
is the Best Friend of a great many of the children (r = .70 + .03). 

The Popular boy is also Quiet (r = .72 + .02); he is Good At 
Games (r = .70 + .02); a Good Sport (r = .69 + .02); and a Real 
Boy (r = .66 + .03). 

The item showing the lowest correlation with popularity for both 
the boys and the girls is Not Bossy, with an r of .35 + .05 for the girls 
and anr of .31 + .04for the boys. It would seem, therefore, that the 
popular boys and girls are Bossy at times. 

The item showing the greatest sex difference in its correlations 
with the other items is Not Quarrelsome. The popular girl is Not 
Quarrelsome (r = .77 + .02), but apparently the popular boy does 
quarrel occasionally (r = .48 + .04). The girl who smiles frequently 
is Not Quarrelsome (r = .50 + .04), but smiling and quarreling do not 
appear to be incongruous among the boys (r = .14 + .05). The girls 
who are mentioned as best friends are Not Quarrelsome (r = .63 + .03), 
but boys may have many friends and yet do their share of quarreling 
(r = .18 + .04). 

A comparison of the correlates of being a Real Boy and Acts Like 
a Little Lady are of interest. The Real Boy is Good at Games (r = 
.75 + .02); Popular (r = .66 + .03); Not Easily Scared (r = .66 + 
.03); is a Good Sport (r = .65 + .03); and is sometimes Bossy (r = 
.27 + .04). The girl who Acts Like a Little Lady is Not Quarrelsome 
(r = .70 + .03); she is a Good Sport (r = .63 + .03); she sits Quietly 
in class (r = .58 + .04); she is Popular (r = .57 + .04); but she is not 
noted for being Good at Games (r = .24 + .05). 


CORRELATIONS BETWEEN THE TEACHER’S JUDGMENT AND THE 
CLASSMATES’ OPINIONS 


Mrs. Davis not only asked the children which of their classmates 
fitted the description, but also asked the teachers the same questions. 
It has, therefore, been possible to find the relation of the teacher’s and 
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classmates’ opinions for the variousitems. The biserial r’s range from 
—.11 + .09 for the item Sissy to +.92 + .03 for the item Not Many 
Friends.* 

Low correlations were obtained for items Popular (r,,, = .20 + 
06); Unsmiling (r,,. = .11 + .07); Not Bossy (r,,, = .09 + .07); and 
Bashful (r,,,. = .09 + .07). 

The items on which there are fairly high correlations between 
the teacher’s and children’s opinions are Not Many Friends (r,, = 
.92 + .03) and Wiggly (r,,. = .81 + .04). The conspicuousness of the 
children with Not Many Friends to both the classmates and the teacher 
is somewhat disturbing. The high correlation obtained between the 
teacher’s and children’s opinions as to the wigglers may possibly be 
explained by the fact that the teacher’s remarks to the children may 
be the criterion by which certain children are judged as Wiggly. 

Other items on which the teacher and children agree fairly well 
are Good at Games (r,,. = .65 + .04); Quiet (r,,, = .62 + .05); Tomboy 
(r,,. = -60 + .09) and Bossy (r,,, = .60 + .05). Certain of these items 
are overt traits which are probably easily observed by the teacher and 
children. 

The fact that the correlations are as high as they are suggests that 
the scores based on classmate opinion do have some validity, since 
only low correlations would have been obtained if the scores were 
completely unreliable and invalid. 


CORRELATIONS BETWEEN THE CHILD’S SELF-ESTIMATE AND THE 
CLASS VOTES 


The children were told that they could mention themselves on the 
inventory items, so that it was possible to compute correlations 
between the children’s self-estimates and the class votes. These 
correlations are much lower than those obtained between the teacher s 
and the children’s votes. The highest correlation was obtained again 
for the item Not Many Friends (r,,, = .53 + .07). It would therefore 
seem that the teacher, the child himself, and all his classmates largely 
agree as to which children have few friends. 

Correlations of .34 + .07 and .35 + .06 were obtained between 
the self-estimates and class votes for the items Wiggly and Not Good at 





* These correlations were computed for both positive and negative ends of the 
scale separately because it was thought that there might be interesting differences 
in the correlations between the teacher’s and classmates’ opinions at the opposite 
ends of the scale. 
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Games. In general, we tend to find higher correlations between the 
self-estimates and class votes on the negative items than on the positive 
ones. The average of the correlations for the negative items is +.25, 
and for the positive items +.05. This would seem to mean that if 
children acknowledge a weakness or uncomplimentary trait they are 
probably right (or at least the other children agree with them). On 
the other hand, children who claim certain desirable traits are as often 
wrong as right (according to the other children). This seems to be a 
very interesting finding psychologically, and it would be interesting 
to know if it holds true at the older age levels. 


SELF-ESTIMATES IN RELATION TO THE TEACHER’S JUDGMENT 


The relation of the self-estimates and the teacher’s judgments is 
practically zero. The method used in determining the relationships 
was the percentage agreement beyond that which might be expected 
by chance.'! Although the relationships between the teacher’s opinion 
and the self-estimates are low, the tendency was again noted for the 
teacher-child agreement to be higher on the uncomplimentary traits 
than on the complimentary ones. 


SCORES OF INDIVIDUAL CHILDREN 


A summary of the class opinion about three different girls is shown 
in Table III. These three children were also members of the Guidance 
Clinic of the Institute of Child Welfare, University of California;‘ 
and it has, therefore, been possible to consider their scores in relation to 
other facts that are known about them. It may be seen, in Table III, 
that case 305 is a popular child; the teacher, sixty-one per cent of the 
class, and the child herself all mention that she is Popular; and no one 
says that she does not have many friends. This child has from an 
early age been considered one of the very best adjusted children studied 
at the Institute. She is a child full of confidence and charm—well 
adjusted to her family, herself, and her playmates. 

Case 301 presents a very different picture. She receives no men- 
tions for being Popular; no child mentions her as a Best Friend; and 
six out of the thirty-two children in the class report that she does not 
have many friends. She is, however, mentioned as being Serious or 
Unsmiling, Easily Scared, Bossy, a Poor Sport, Bashful, Not Good at 
Games, and one who Acts Like a Little Lady. This child is over- 
indulged at home, and has had poor eating and sleeping habits. She 
has been fidgety, very dependent, constantly demanding attention 
from parents and grandparents, fretful, whining and full of fears. 
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In comparison with cases 301 and 305, case 373 is consistently 
ignored. She doesn’t even mention herself but is very aware of the 
other children, mentioning them twice as often as case 305, and over 
three times as often as case 301. This little girl (case 373) has shown 
many symptomatic problems in response to her tense, thwarted, 
irritable, and hard-working mother and to her inadequate, hypo- 
chondriacal father. She has had much strain in her home, and very 
few supports; she is shy, repressed, and eager for social support and 
affection. 

It is interesting to note that the popular child; case 305, received 
much more notice from the girls (sixty-nine per cent) than from the 
boys (thirty-one per cent) ; whereas for the less popular child (case 301) 
the reverse is true, seventy-five per cent of her mentions were made 
by boys. 


SUMMARY 


1. Reputation differences among children in the first three school 
grades were determined by means of a modified ‘‘Guess Who” ques- 
tionnaire, administered individually to three hundred seventy-six chil- 
dren in twelve different classrooms. 

2. The agreement among the children with respect to their judg- 
ments of each other was fairly good for approximately eighty-two per 
cent of the children. Disagreements of more than one mention 
occurred for eighteen per cent of the cases. 

3. Interesting sex differences in reputation scores were noted. 
Although the distributions for the boys and girls show a considerable 
degree of overlapping, the boys do tend to be less Scared and less 
Bashful than the girls, but are more Wiggly, more Quarrelsome, Better 
Sports, and Better at Games than the girls. Relatively few boys have 
the reputation of being Sissies. However, a sizeable number of girls 
have reputations for Acting Like Little Ladies, and a sizeable number 
have reputations as Tomboys. 

4. Intercorrelations of the scores range from .02 to .81, with an 
average of .52 for the girls, and .44 for the boys. Comparison of the 
correlates of Popularity and Unpopularity at these ages shows different 
configurations for the boys and girls. 

5. The relation of the teacher’s and class opinion is fairly high 
(average r = .42), ranging from —.11 + .09 (Sissy) to +.92 + .03 
(Not Many Friends). ~ 
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6. The relation of the children’s self-estimates to class opinion is 
low, but the agreement is found to be definitely higher for the uncom- 
plimentary than for the complimentary aspects of each pair of items. 

7. Negligible relationships were found between the children’s 
self-estimates and the teacher’s opinion. 

8. Consideration of the reputation scores of individual children 
suggests that with respect to class opinion, each youngster lives in a 
very different psychological environment. It would seem important 
that this aspect of the child’s environment be taken into consideration, 
both in the study of the youngster’s adjustment and in the planning 
of any social therapeutic program. 
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REPORT ON THE YOUNG-ESTABROOKS 


STUDIOUSNESS SCALE FOR USE WITH THE STRONG 


VOCATIONAL INTEREST BLANK FOR MEN* 


C. W. YOUNG AND G. H. ESTABROOKS 
Colgate University 
VALIDATING MEASURES OF STUDIOUSNESS 


It has long been recognized that factors not measured by our 
college intelligence examinations contribute as much as intelligence, 
if not more, to scholastic achievement in college. For purposes of 
convenience, we shall give the trait name ‘‘ studiousness’’ to the sum 
total of factors of personality and attitude which make for high college 
grades and which are not correlated with intelligence, recognizing, 
of course, that such a term may not be really appropriate for the 
description of all such factors. 

It is natural to expect the studious individual to be characterized 
by certain personality traits or attitudes, but the many studies which 
have searched for some correlation between present-day tests of 
personality and college grades have failed to discover any consistent 
relationship, except for a possible slight positive correlation between 
scholarship and introversion. (See Stagner.’) 

Furthermore, as one of us has demonstrated in a recent article,’ 
attempts to construct a test of studious personality through the use of 
college standings as criteria for weighting individual test items have 
failed to eliminate properly the factor of intelligence. In the above- 
mentioned article it has been shown that the residual index of an 
individual’s grades relative to intelligence will give an indez of studious- 
ness, which is essentially a measure of the factors contributing to grade 
point average other than those measured by intelligence. This index 
does not exactly measure studiousness as it is defined above, since it 
probably includes factors—such as freedom from economic burdens— 
which are not properly factors of attitude and personality; neverthe- 
less, it should include all attitude and personality factors and hence 
serve as an adequate criterion of their presence or absence.f 





* We are indebted to the following individuals who have furnished us with data 
that are analyzed in this paper: Henry C. Mills, University of Buffalo; D. S. 
Parks, University of Toledo; and 8S. L. Brintle, Long Beach Junior College. 

+ The general formula for the residual index is: Rly, = Y — byzX. Fora 
studiousness index this may be translated into SJ = G — b,;J, where G stands for 
grades and J for intelligence. Actually, we used a less convenient variation of this 
formula, as described in our article referred to above.* 
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Using the ACE Psychological Examination as a measure of intelli- 
gence, and average grades for the first three years of college as measures 
of scholastic success, we calculated such a Studiousness Index for each 
of five hundred eighty-eight Colgate students of the classes 1932 and 
1933. All subjects were male, since Colgate is non-coeducational. 
This measure correlated —.01 with intelligence and .94 with grade 
point average, the latter coefficient being closely approximate to the 
coefficient of alienation. Thus the assumption that the Studiousness 
Index measured factors contributing to scholastic success but not 
measured by intelligence was verified. 

Then, employing the individuals having the one hundred highest 
and one hundred lowest studiousness indices as contrasting groups, 
we made an item analysis of a number of tests to determine which 
items were differential for studiousness. These tests were: 

Colgate Personal Inventory, B2. 

Colgate Personal Inventory, C2. 

Strong Vocational Interest Blank for Men. 

Allport Test of Ascendance Submission. 

The X’s on the Pressey X-0. 

The 0’s on the Pressey X-0. 

The items were weighted on the basis of the degree of probability 
that the difference between the studious group and the unstudious 
group was not a true difference. Thus, items for which the probability 
was between 44 and 14 were weighted 1; between 14 and Mo, 2; 
between 149 and 98, 3; etc. Addition of the weights is proportional 
to multiplication of the probabilities, and hence the total scores on the 
test are related to each other in units of probability of high or low 
studiousness ranking. * 

Each test enumerated above was scored for studiousness on the 
basis of these weightings for two hundred fifty students in the class of 
1934 at Colgate, none of the members of this class having been among 
the group on which the tests were validated. 

Table I shows in the first row the correlations between each of 
these tests scored for studiousness and grades for the first three years 
of college work. In the second row, correlations with intelligence 
are shown, and in the third row, partial correlations with grades, 
intelligence held constant. In the fourth row is shown the sum of all 
the weightings for each test, which indicates how well the test was 





* An analysis of the ‘‘studious personality’ in terms of the items which differ- 


entiate between the studious and unstudious groups will appear in a forthcoming 
article. 
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capable of distinguishing between the studious and the unstudious 
within the group on which it was standardized, while the fifth row 
gives the ratio of item weightings to items, indicating the extent to 
which the test contains the right sort of material for measuring studi- 
ousness. All tests but the Strong vary in rank of usefulness as meas- 
ured not only by partial correlation with grades, but also by total 
weightings, and ratio of weightings to items. The Strong test is 
consistently superior to them all in all three respects. Combined with 
intelligence it correlates .55 with grades, an improvement of .16 over 
the correlation obtainable using intelligence alone. Throwing the 
other tests into the pool would increase this multiple correlation by 
only three or four points. 


Tasie I.—Summary or Resuits SHOWING THE Power or Six Tests TO MEASURE 
STuDIOUSNESS WHEN Items Have BEEN WEIGHTED ON Basis oF AGREEMENT 
WITH THE Stup10usNEss INDEX 








Intelli- J 

— Strong B2 c2 A-S Pr.—X's | Pr.—0's 
Grades............. .39+ .03) .35+ .04).26+ .04).214.04) .14+.04) .08+.04) .06+.04 
Intelligence.........)......+. — .10+ .04).02+ .04).17 + .04) — .02 + .04) — .15+ .04/— .05+ .04 
Partial tgg7 --- ++ e sje eeeeeee .42 .27 .16 .16 15 .09 
Weightings.........}........ 1335 263 237 ' 93 406 389 
Weightings/Items...|........ 1.06 .80 .99 -76 .81 71 


























It seems highly probable that a test of about a hundred carefully 
selected items, taking not more than twenty minutes to administer, 
correlating as highly with school achievement as the best intelligence 
tests, and showing zero correlation with intelligence, might be devel- 
oped. Prior to the construction of such an ideal test of studiousness, 
we believe that the Strong blank, scored for studiousness, may prove 
distinctly useful as a supplement to the intelligence test in both high 
school and college. The Strong blank has the additional advantage 
of being useful in personnel work for vocational guidance and for the 
measurement of interest maturity and masculinity. We have, there- 
fore, published a scale for scoring this blank for studiousness.* 

The corrected reliability coefficient for this scale, calculated by 
the split-halves method on two hundred ninety-five Colgate students 
is .80. Indications of its validity in combination with intelligence 
will be reported below. 


* A similar scale for scoring the Strong Vocational Interest Blank for Women 
is now in process of preparation. 
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LIMITATIONS AND ADVANTAGES OF THE YOUNG-ESTABROOKS SCALE 


The writers believe that tests of studiousness—defined as the 
sum total of factors of personality and attitude which make for high 
scholastic achievement and which are not correlated with intelligence— 
should constitute as integral a part of educational testing programs 
as do tests of intelligence. We do not look upon the scale for use with 
the Strong Interest Blank, however, as a perfected instrument. As 
far as we know, it is the first test of the sort published in which the 
residual index has been used as the criterion for validation. (See* 
for a discussion of the superiority of the residual index as a criterion 
for studiousness.) Hence, it is the pioneer test for measuring person- 
ality traits related to scholastic success and not related to intelligence, 
and it doubtless suffers from some of the limitations of pioneering. 

Its chief limitation is that it is confined to the measurement of 
only those traits of personality or attitude which are included within 
the scope of the Strong Blank. It seems probable that if a wider 
search were made for items which would differentiate between studious 
and unstudious groups, an instrument that would be superior to our 
scale both in reliability and validity could be constructed. Neverthe- 
less, the employment of the Strong Blank, rather than an independent 
test, has certain advantages, since, because of its multiple usefulness, it 
is & particularly economical instrument—especially from the point 
of view of time saved in administration—for the measurement of any 
variable. When the blank is already in use at an institution for 
vocational guidance or other purposes, the measurement of studious- 
ness requires only the scoring of blanks that have already been made 
out. 

Furthermore, the present scale does contribute something to the 
prediction of grades that is not measured by an intelligence test, 
although, as we shall point out below, the intelligence-studiousness 
composite may not prove superior to other means of prediction. 
More important, it serves a diagnostic function, since it isolates from 
the sum total of traits related to scholastic achievement those traits of 
personality or attitude which are not related to intelligence. The 
nature of these traits will be discussed in our forthcoming article on 
the studious personality. 

Finally, the scale constitutes a research instrument for the further 
study of the trait of studiousness, thus helping to pave the way for a 
more perfect test. For instance, before making plans for further test 
construction, we were interested in knowing whether our scale tested 
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factors which were specific to scholastic success at Colgate University 
alone, or if the factors making for high grades in one school would be 
found general the country over. Obviously, such a determination 
must be made before any more extensive study of studiousness at a 
single institution could be considered universally valid. In addition 
to this more general research interest, we wanted to know, prior to the 
publication of the scale, to what extent it could be universally applied. 


THE APPLICABILITY OF THE STUDIOUSNESS SCALE IN VARIOUS 
INSTITUTIONS 


Personnel workers at a number of schools and colleges were kind 
enough to send us Strong blanks that had been administered to their 
students, together with grades, intelligence scores and, in one case, 
scores on placement tests. We were thus enabled to score these 
blanks and determine whether or not the scale showed the same 
relationships to intelligence and scholastic success in these other 
schools that it did at Colgate. 


Taste II.—Summary or Resvuits SHOWING THE USEFULNESS OF THE YOUNG- 
Estasrooxs ScALE For STUDIOUSNESS FOR E1cut Groups oF HIGH-SCHOOL 
AND Co.iecs Strupents. G, Grapzs; S, Strup1ousness; anv I, 








INTELLIGENCE 
Group Sex| N | res} raz | rer| tes.1| Rass) —* 
Lafayette High School, Buffalo, 

vines tes cad M | 107} .35| .19| .42) .34) .52) .10 
Bennett High School, Buffalo, 

—— ban Garde M | 94 .33) .21) .54) .321 .60) .06 
Long Beach Junior College....... M | 100) .29) .07} .30) .31 42] .12 
Toledo University, Freshmen.....| M | 188) .27| .14| .50} .28 . 56 .06 
Toledo University, Freshmen.....| F 87) .44| .03) .37) .49 .59 | .22 


Colgate University, Juniors, Class 





LSE ES M | 250 85-10 89] .42| .55] .16 
Colgate University, Freshmen, 
a iil lle M | 286| .27 00) 64] .82 | .60] .06 
Colgate University, Freshmen, 
ey a M | 273] .31| .00} .55) .81 | .61| .06 
— ee ee .. | ...] .88] 08] .45] .85 | .56] 211 





























Table II gives a summary of correlations secured for eight groups, 
three at Colgate and five at other institutions. There seems to be a 
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slight tendency for the test to correlate positively with intelligence 
but, if the two high-school groups are omitted, this tendency is suffi- 
ciently low to be attributable to chance. It may be that among 
high-school students there is a true, though slight, correlation between 
intelligence and our scale. 

The intelligence test employed for all groups except the one at 
Toledo was the same as that used in standardizing the scale. For the 
Toledo groups, the Ohio State College Entrance test was used. The 
results seem to indicate that the type of intelligence test employed 
does not affect the usefulness of the scale. 

It was not our purpose in standardizing the scale to produce an 
instrument that would be used alone for the prediction of scholastic 
success. It was intentionally constructed to measure only part of the 
total array of factors making for high grades and to be employed in 
combination with an intelligence test to predict achievement. Con- 
sequently, the correlation coefficients between studiousness and grades 
are not properly coefficients of validity. The only true validity 
coefficients that appear in Table II are the multiple correlations of 
grades with intelligence and studiousness combined. One cannot, 
therefore, speak of the validity of the scale, but rather of its usefulness 
in improving the prediction made possible by the intelligence test. 
This can be indicated by the difference between the multiple correlation 
coefficient and the correlation between grades and intelligence, which 
difference is shown for each group in the last column of Table II. It 
will be seen that this improvement is not much greater for the Colgate 
groups than for others. In fact, it is greatest for the Toledo women, 
& group which differs from the criterion group not only with respect 
to sex, but also with respect to the type of intelligence test used. It 
seems reasonable to conclude that the trait “‘studiousness” remains 
fairly constant from one college or high school to another and that the 
scale does not lose appreciably in validity by being standardized at a 
single institution. 

Table II seems to suggest one possible exception to this conclusion 
in its indication that the usefulness of the studiousness scale bears an 
inverse relationship to the correlation between intelligence and scho- 
lastic standing. For the four groups for which the latter correlation 
is over .50, the gain in correlation through combining the studiousness 
scale is only .06 in each case. For the four other groups it is above 
-10 in all cases and averages .15. Furthermore, the first four groups 
show an average partial correlation between grades and studiousness 
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of .31, while the second four average .39. Finally, the average raw 
correlation between grades and studiousness for the first four is .29, 
for the last four, .38. In short, as correlation of grades with intelli- 
gence increases, that between grades and studiousness decreases. 
The results from Colgate University illustrate this tendency perfectly. 
For the past six years, correlation coefficients between intelligence 
and grades, not only those here reported, but various others that have 
been calculated from time to time, have been steadily rising, from 
around .30 up to the .55 for the freshmen of the class of 1939. This 
rise has coincided with marked changes in conditions of instruction. 
A faculty composed to a large extent of men near the retiring age has 
been virtually replaced by ene composed almost entirely of men under 
forty, while at the same time, many changes have been made in the 
curriculum and methods of teaching. As Table II shows, partial 
coefficients between studiousness and grades appear to have fallen 
off regularly as correlations between grades and intelligence have 
increased. Apparently, certain not wholly definable conditions of 
instruction are capable of producing a high correlation between intelli- 
gence and grades, and under these conditions, the correlation between 
the studiousness scale and grades tends to be low. Vice versa, condi- 
tions of instruction making for low correlations between grades and 
intelligence seem to be associated with relatively high correlations 
between studiousness and grades. 

Two possible explanations may be suggested. It may be that 
where intellectual factors predominate in determining grades there is 
less possibility of non-intellective factors being operative. Or it may 
be that under conditions where intelligence contributes a compara- 
tively large amount to grades, ‘‘studiousness’’ (defined simply as the 
‘sum total of non-intellective personality factors”) is composed of 4 
somewhat different set of traits, attitudes, or interests than under 
conditions where achievement and grades are only slightly correlated. 
Since the scale for the Strong blank was standardized on a group for 
which the grade-intelligence correlation was low, it might thus be best 
fitted for use under similar situations. Possibly both explanations 
account in part for the phenomenon. At any rate, it appears that 
the scale will be especially useful, as far as improving prediction of 
scholastic achievement is concerned, at places where the correlation 
between intelligence and achievement is low. Furthermore, it may 
be well to modify our tentative conclusion that the trait studiousness 
does not vary appreciably from one college to another by adding the 
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reservation that it may vary somewhat as the correlation between 
grades and intelligence varies. 


THE RELATION BETWEEN STUDIOUSNESS AND ACHIEVEMENT TESTS 


The relation of the studiousness test to achievement tests is of 
interest because of the wide employment of these tests in educational 
measurement. Concerning this relationship we may raise two ques- 
tions: First, if we look upon achievement tests rather than instructor’s 
grades as the criterion of scholastic success, is the studiousness test 
related to the sort of achievement which they measure, and is it of any 
value as a supplement to an intelligence test in predicting that sort 
of achievement? Second, do such tests measure both studiousness 
and intelligence, and hence give as good prediction of teacher’s grades 
as intelligence and studiousness combined? Here we must make a 
distinction between the prediction of general scholastic success and the 
diagnosis of probable success in individual school subjects. Achieve- 
ment, placement, or aptitude tests may be useful for this latter func- 
tion, but neither the intelligence tests nor the studiousness scale are 
constructed so as to perform it, and hence we can only discuss their 
value with respect to a general prediction. 

For the Colgate Freshmen of the classes of 1938 and 1939 we have 
scores on the General Culture and General Science tests of the Codpera- 
tive Testing Service which were administered to them on entrance, 
while for the two Buffalo high schools appearing in Table II, we have 
been supplied with scores on the Iowa Placement tests for Social 
Studies and for English, both of which are essentially achievement 
tests. To facilitate presentation, we have taken average correlation 
coefficients for the two Colgate classes and also average coefficients 
for the two high schools and have calculated the multiple correlations 
between the other three variables (namely, grades, intelligence, and 
studiousness) and each of the two pairs of achievement tests, so as 
virtually to combine each pair into a single measure of general scho- 
lastic efficiency. The coefficients thus secured are shown in Table III. 
The results for both the Buffalo and the Colgate groups are very 
similar. The correlation between achievement tests and intelligence 
is relatively very high, while between achievement tests and studious- 
ness it is only slightly higher than between studiousness and intelli- 
gence. The partial correlation between achievement and intelligence, 
studiousness constant, is .718 and .669 for the Colgate and Buffalo 
groups, respectively. The multiple correlation of achievement test 
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scores with studiousness and intelligence combined is .729 and .698, 
respectively, exceeding the correlation between intelligence and 
achievement tests alone by only .016 and .012. In short, studiousness, 
while it contributes slightly to achievement on these tests, is not nearly 
as important as it is to the type of achievement signified by instructor’s 
grades, while intelligence is of greater importance. Incidentally, this 
indicates that the failure of instructor’s grades to correlate as highly 
with intelligence as do achievement tests is not solely a result of the 
lower reliability of the former measures, since they do show a higher 
correlation with whatever traits of personality the studiousness test 
measures. The nature of these traits will be discussed in a later 
article, but it may be remarked in passing that traits which make for 
achievement on definitely assigned tasks ought to contribute more 
to instructor’s grades than to achievement tests, and that the studious- 
ness scale probably measures traits of that sort. 

TasLe III.—Averace INTERCORRELATIONS BETWEEN GRADES, INTELLIGENCE, 

Srupi0ousNess AND ComposiTre ACHIEVEMENT Trst Scores For COLGATE 
FRESHMEN AND BurraLo Hicu-scHoot Srupents. 1, Grapzs; 2, 


Composire ACHIEVEMENT Test Scorn; 3, INTELLIGENCE (ACE); 
AND 4, Stup1ousngEss ScaLE 








1 2 3 4 
1 bone .517 477 . 340 
2 . 567 — 686 . 270 
3 . 546 . 713 ene . 204 
4 . 293 . 184 .045 

















Note: Colgate correlations are shown at the lower left, Buffalo correlations at 
the upper right. 


Taking up the question of the power of the achievement tests to 
predict scholastic success, it may be noted that in both groups the 
composite achievement test shows slightly higher correlation with 
grades than does intelligence. When intelligence is combined with 
studiousness, however, the multiple coefficients become .609 and .571. 
These multiple coefficients slightly exceed those secured when intelli- 
gence and achievement are combined, which are .602 and .543, respec- 
tively, as well as those secured when studiousness and achievement are 
combined, which are .599 and .557, respectively. In brief, the studi- 
ousness scale, in combination with an intelligence test offers a com- 
posite predictive device that is superior to any combination of two or 
more tests with which we have been able to compare it. At the same 
time, the differences between the above multiple coefficients are so 
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slight as to be practically negligible, and it cannot be claimed that the 
scale marks a great advance in our ability to predict scholastic success. 
But in this comparison of the intelligence-studiousness combination 
with achievement tests as predictive measures, it is worth while keeping 
in mind the matter of economy in administration. Thus, the ACE 
psychological examination requires only an hour to administer and 
the Strong blank only about forty minutes. The General Culture 
and General Science tests, on the other hand, require four hours all 
together. Economy and predictive power both considered, it seems 
probable that the intelligence-studiousness composite provides one 
of the most efficient testing instruments for the freshman year that has 
yet been devised. 


THE SUPERIORITY OF HIGH-SCHOOL RECORD FOR PREDICTING 
COLLEGHD ACHIEVEMENT 


In actual practice, most colleges today admit students on the 
basis of high-school record. Since this is the procedure at Colgate, 
we were interested in discovering whether or not achievement would 
be better predicted by the high-school record or by the intelligence- 
studiousness composite. Admission does not take place on the basis 
of ‘‘raw” high- or prep.-school standings, but on the basis of those 
standings corrected with regard to the standards of the secondary 
schools from which they are sent. Thus, an average of B from a 
school of high scholastic standards will be given a B+, while one of 
C+ from a school of low standards will be credited C or C—. These 
corrections are made by the admissions officer on the basis of a wide 
experience and long acquaintance with the schools from which the 
greater part of the student body is drawn, and, in spite of their some- 
what subjective nature, they do improve upon the uncorrected stand- 
ings. Table IV shows the average intercorrelations for the classes 
of 1938 and 1939 between corrected high-school record, studiousness, 
intelligence, and first-semester grades. The. correlation between 
grades and high-school record of .643 exceeds the multiple coefficient 
between grades and intelligence-studiousness by .034. When intelli- 


Taste IV.—AveRaGEe INTERCORRELATIONS BETWEEN Grapes, INTELLIGENCE, 
Srup1ousNEss, AND CorRECTED HiGH-scHooL Recorp ror CotgaTe FRESHMAN 
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gence is combined with high-school record, the multiple coefficient is 
.689, an increase of .046, but when studiousness is so combined, the 
coefficient is .652, an increase of only .009. 

The reason that studiousness fails to add much to prediction when 
high-school grades are counted in is that it is correlated with high- 
school grades nearly as highly as with first-semester freshman grades. 
Indeed, it is improbable that any group of tests can be expected to 
improve predictions based on high-school average except by exceeding 
the latter measure in reliability, since, to the extent that it is reliable, 
scholastic success in high school should be an almost perfect indicator 
of probable success in college. The chief source of unreliability in 
high-school records is the different standards among the schools from 
which a student body is drawn. This discrepancy is probably over- 
come considerably by the methods of correction described above. 
However, Williamson (2) in an article which we have had the privilege 
of reading prior to publication, has made the same finding at the 
University of Minnesota that we have at Colgate, namely, that, while 
the studiousness scale does measure something contributing to college 
achievement which is not measured by intelligence, it is capable of 
adding very little to the prediction secured from high-school grades, 
and that the intelligence-studiousness combination does not predict 
as well as high-school grades; and in this case the high-school grades 
are presumably uncorrected. 

It should be remembered, however, that correlation between 
high-school standings and college grades has at times been found to be 
inferior to that between intelligence scores and grades. Under these 
circumstances, and also in cases where predictive potentialities of 
both high-school records and intelligence tests are low, the studiousness 
scale might be used to secure an important improvement in predicting 
scholastic achievement. 


RESUME 


For each of five hundred eighty-eight male students at Colgate 
University an ‘‘index of studiousness” was calculated which was 
essentially an index of the residual of grades on intelligence. Using 
this index as a criterion, five different tests were subjected to item 
analysis to determine which items differentiated between studious 
and unstudious groups. When the items were weighted on this basis 
and the tests scored, it was found that all tests tended to show 4 
negligible correlation with intelligence, together with a low positive 
relationship with grades, showing that a group of personality traits, 
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attitudes, or interests was being measured which was unrelated to 
intelligence and which contributed to scholastic standing. Of all the 
tests analyzed, the Strong Interest Blank showed evidence of being 
the most significant single measure. A scale for scoring this blank 
for studiousness has been published. This scale was correlated with 
grades and intelligence for thirteen hundred eighty-five individuals 
in eight different groups at five different high schools and colleges. 
It showed an average correlation of .35 with scholastic standing as 
opposed to an average of .45 between scholastic standing and intelli- 
gence. Correlation between the scale and intelligence centered around 
zero for the college groups, but averaged .20 for two high-school groups. 
When combined with the intelligence tests, the studiousness scale 
produced an average multiple correlation of .56 with grades, an average 
increase of .11. The scale does not correlate more highly with grades 
at the school where it was standardized than at other places, but it 
seems to show relatively high correlations with grades wherever the 
correlation between grades and intelligence is low and vice versa. It 
does not correlate very highly with achievement test scores. 

The combination of the intelligence test with the scale provides a 
composite test for predicting college or high-school achievement that 
is very efficient, compared to other test combinations, from the stand- 
point of providing a high degree of prediction with a small period of 
time devoted to testing. It does not appear, however, that this or any 
other combination of tests is likely to prove consistently superior to 
the high-school record in predicting college success, although a more 
perfect studiousness test might make possible a really important 
improvement over the predictive measures now available. The 
present scale should be chiefly useful as a diagnostic indicator of 
studiousness, defined as a group of personality traits or attitudes that 
contribute to scholastic success without being related to intelligence, 
and as an instrument of research for the further study of this group 
of traits and attitudes. 
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FACTORS INFLUENCING THE VALIDITY OF A 
SCHOLASTIC INTEREST SCALE! 


CHARLES I. MOSIER 
University of Florida 


The development by C. W. Young and G. H. Estabrooks at Colgate 
University of a scale for the measurement of Scholastic Interest? used 
with Strong’s Vocational Interest Blank raises certain problems which 
must be answered if the scale is to be of value at other colleges and 
universities, and if it is to establish the generality of such a trait as 
Scholastic Interest. The major problem which arises is whether the 
scale is valid when used on subjects other than those on whom the test 
is standardized, or those very similar to them. 

The Scholastic Interest Scale was standardized on a group of 
Colgate University freshmen, most of whom were drawn from the 
cultural milieu of the Northeastern United States, and all of whom 
were pursuing Arts and Sciences work. The present study is designed 
to test the validity of the scale on a group of subjects differing from the 
standardization group in geographical location, cultural background, 
vocational and academic interests. The general plan of investigation 
has been to determine for a group of students at the University of 
Florida, most of whom are drawn from the State, and who are enrolled 
not only in Arts and Sciences courses, but in technical and pre-profes- 
sional curricula as well, the inter-relations between the following 
variables: 


1. Scholastic Interest Score (SI). 

2. ACE Psychological Examination Score (ACE). 

3. Honor Point Average for the first semester (HPA-1). 
4. Honor Point Average for the first year (HPA-2). 

5. Curriculum, e.g. Business Administration, Engineering. 


The records of the Bureau of Vocational Guidance and Mental 
Hygiene of the University were searched for those students who had 
voluntarily taken the Strong Vocational Interest blank during their 


1 Acknowledgement is due Professor E. D. Hinckley for his aid and encourage- 
ment in this study. 

2 Young, C. W., and Estabrooks, G. H.: ‘‘ Non-intellectual Factors in College 
Scholarship,” Psychol. Bull., Vol. XX XI, 1934, p. 735. 

Young, C. W., and Estabrooks, G. H.: Young-Estabrooks Scale for Measuring 
Studiousness by Means of the Strong Vocational Interest Blank. Stanford Univer- 
sity, California: Stanford University Press, 1936. 
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freshman year. Cases were selected from these to give a sample of 
the freshman class which should be representative as far as distribution 
of intelligence was concerned. This was accomplished by taking equal 
numbers from each decile of the distribution of ACE scores. A final 
group of two hundred forty cases, (twenty-five in each decile except the 
two lowest) was secured. For each student was recorded his curricu- 
lum, ACE Psychological Examination total score, honor point average 
for the first semester and for the first year, and Scholastic Interest 
score. 

The Scholastic Interest Scale (SI) is a scale of weights to be used 
with the items in the Strong Vocational Interest Blank, and purports 
to measure ‘‘Studious-ness,” defined as ‘“‘That part of the student’s 
honor point average which is not dependent upon or correlated with 
intelligence as measured by the ACE Psychological Examination.’’! 
Its intercorrelations with test intelligence and honor point average 
for a group of Colgate University students were :? 


VARIABLES r 
SI and HPA .35 
SI and ACE -.10 
ACE and HPA -40 
SI and HPA, ACE constant .43 


The American Council on Education Psychological Examination 
(ACE) needs no description here. In this study total score, expressed 
in units of standard score on the freshman class, was taken as a measure 
of test-intelligence. 

Honor Point Average was computed for the first semester (HPA-1) 
and for the freshman year (HPA-2) in accordance with the scheme 
used at the University of Florida: 


Honor Points PER 


Lerrer Grape Semester Hour 
A 3 
B 2 
Cc 1 
D 0 
E —2 


The algebraic sum of the honor points earned in the stated period 
divided by the number of semester hours carried, yields the honor point 
average for that period. The reliability of this as a measure of achieve- 
ment is indicated roughly by the correlation which obtains between 





1 Letter from C. W. Young to the author. 
* Loe. cit. 
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HPA for the first and second semesters of the same year. On a group 
of five hundred fifty-six cases this correlation coefficient was found to 
! be .689. The low reliability coefficient may be explained by: (1) The 


homogeneity of the group due to the elimination of the poorer students 
at the end of the first semester; (2) real changes in the quality of an 


4 individual’s work such that a perfectly reliable measure would reveal 
‘ ‘ discrepancies between first and second semester; (3) inherent unrelia- 
V bility of grades as a measure. It is probable that all three of these 
Bye factors operate to some extent. 
WTA The cases were divided into four groups on the basis of the curricu- 
; lum of registration in the following way: 
| Group L........ A. B., A. B. in Education, Journalism 
Aa — Il....... B. S., and Pre-Medical students 
NO Group III...... Engineering, Agriculture, Pharmacy, Architecture 
vis i Group IV....... Business Administration, Pre-Law 
, 7 All calculations were carried out independently for each group and for 
4 the total group of two hundred forty cases. 
cia The results of the investigation are presented in tabular form. 
; Table I presents the distribution of students by curriculum, and shows 
the composition of the four groups. 








t Tasty I 
: Curriculum of group Number of cases 
es ee * 42 
I st et eee 23 
Bachelor of Arts in Education...................e0+- 10 
bhesnasedcteee scene 9 
Shins dbs 000 cence ssecevceeese * 70 
VV 34 
ccc ceases beeen bes 30 
6 
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Group I is seen to be composed primarily of students enrolled in 
the liberal arts curricula; Group II of those students who might be 
expected to show an interest in the results of science for ‘‘science’s 
sake’’; Group III of those interested in the applications of science to 
practical problems; and Group IV of those students interested pri- 
marily in the commercial and industrial aspect of university training. 
An attempt was made in the grouping to secure groups which should 
be homogeneous with respect to their interests and should differ as 
widely as possible from group to group. There is the possibility that 
individuals are misplaced in their choice of curriculum, particularly 
since this study is confined to freshmen where such mistakes are com- 
mon. However, the possibility of such isolated instances markedly 
affecting the results is negligible. 

Table II presents for each group the mean and standard deviation 
for each of the measures: 








Taste II 
Group | Group | Group | Group 
Measure I II m i w | ™ 

ACE Psychological Examination 
en a RRR a a Sei, .345) —.271] .884, .146) .115 
(Abie Cra ei Spit 1.044] 1.006 70 864, .954 
Scholastic interest 
D —1.429| —23. 741- 38. o20 -20. 722-27. ooo 
60. 1811 72. 9421 64. 6221 60. 3481 66.265 
HPA-1 
ETT ga Tare iy 1.310}  .918/ 1.107) 1.125] 1.093 
a pie ee ama gare g PR 811} .978] .909] .822| .897 
HPA-2 
TES a REEL os. 1.382) 1.143] 1.261) 1.189) 1.228 
AR Shs .764, .892] .813| .834] .838 




















An examination of Table II reveals several facts of significance for 


this study. Of these the most important is the steady decrease in 
mean SI score from Group I to Group IV. The reliability of that 
decrease is apparent from an investigation of the Critical Ratios 
(Difference/PE Difference) of the difference between the means, 
shown in Table III. 

Table III indicates that the division into groups has been justified 
by the significant differences between the mean scores on the Scholastic 
Interest Scale. 
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Tasue II].—Dirrerence anp Critical Ratios Scnouastic INTEREST 








Scorzs 
Groups Differences Critical Ratio 
Group I-Group II 22.31 2.60 
Group I-Group III 32.50 3.80 
Group I-Group IV 38.29 4.88 
Group II-Group III 10.19 1.23 
Group II-Group IV 15.98 2.12 
Group III-Group IV 5.79 .77 











In Table IV are presented the results of the correlation studies. 
The variables correlated are represented in the first column by number 
according to the following scheme: 


1. Scholastic Interest Score. 


2. ACE Psychological Examination, Standard Score. 
3. HPA, first semester. 
4. HPA, freshman year. 


TaBLeE IV.—ZeERo-orDER, PARTIAL, AND MULTIPLE CORRELATIONS 














Corre- | Group Group Group Group 
ijen| 1 [oY] ol P™] om | FP ioe | Oy 
12 .029 | .10 | .169] .08; .053 | .09 | —.146) .08| .038 | .05 
13 .297 | .10 | .404] .07 | .226]| .09 .036} .08 | .255 | .04 
14 .466 | .09 | .325 | .08| .239 | .09 .054) .09 | .253 | .05 
23 .551 | .07 | .569| .05 | .462 | .07 .562} .06 | .546 | .03 
24 .567 | .08 | .650| .05| .512 | .07 .609] .06 | .584 | .03 
13.2 .3837 | .09 | .380 | .07 | .227 | .09 .144) .08 | .280 | .04 
14.2 .546 | .08 | .286/] .08| .246| .09 .182} .09 | .284)} .05 
23.1 .568 | .07 |} .556 | .05| .463 | .07 .574) .05 | .555 | .03 
24.1 .626 | .07 .638 | .05 .515 | .07 .624| .06 .694 | .03 
Rs.12| .619 . 649 . 506 . 574 . 594 
Ry.12 | .724 . 686 . 554 .625 . 628 



































Since HPA-2 is, in all probability, a more reliable measure of 
college work than is HPA-1, it will be more profitable to devote the 
greater portion of the detailed discussion of the results to those in 
which HPA-2 is utilized as the criterion of success. Before doing s0, 
however, it will be well to notice that the correlations involving 
HPA-1 parallel those involving HPA-2 with the exception of the 
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discrepancies noted below. In every case the coefficients of correlation 
for HPA-2 are higher than the corresponding coefficient with HPA-1l, 
with the single exception of the correlations with SI for Group II. A 
second discrepancy is found in that ris; is lower for Group I than for 
Group IT. 

Study of the results in Table III yields the answers to the problems 
posed for investigation. A consideration of the results for the Total 
Group shows correlations between SI and HPA-2 slightly lower than 
that reported by Young and Estabrooks for the students of Colgate 
University. The correlation between ACE and HPA is slightly higher. 
The cause of the discrepancy between the results of this study and 
those of Young and Estabrooks is to be found in a consideration of the 
results group by group. It will be recalled that the students of Colgate 
were all in Arts and Sciences Curricula and hence are comparable with 
our Groups I and II. The results for those two groups show correla- 
tions between SI and HPA which compare favorably with the results 
reported by the authors of the scale. The validity of the scale does 
not seem to be affected by changes in the cultural background from 
which the students are drawn, provided the students are in liberal arts 
curricula. 

There is a marked decrease in the size of the correlation coefficient 
T14 a8 we pass from Groups I and II to Group III, indicating that the 
SI scale is not as valid a measure for technical students as it is for 
either the A. B. or B. 8. students. The differences between Group I, 
Group II and Group III in the value of the coefficient ri4 are not, 
however, statistically significant. Not so, however, with the corre- 
sponding differences with Group IV. There the Critical Ratios of the 
Differences are: 








Correlation Difference Critical Ratio 
Tis Group I-Group IV 2.06 
Tis Group II-Group IV 3.48 
ri⸗ Group I-Group IV 3.27 
ri⸗ Group II-Group IV 2.21 











These results establishing the statistical “significance of certain 
of the differences, together with the general trend would lead to the 
conclusion that the validity of the SI scale depends to a considerable 
extent on the curriculum interests of the students tested. Validity 
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is not influenced by geographical location, nor by the shift from an 
endowed institution to a State university, provided curriculum inter- 
ests are held constant. Moreover, the validity of the scale is not 
markedly affected by a shift in the dominant interest of the subjects 
from liberal arts to scientific studies. The validity is reduced for 
technical students and students of applied sciences, the correlation 
coefficients between SI and HPA for Group III being less than three 
times their probable error, and is reduced to approximately zero for 
students of business administration and allied subjects. The relation 
between ACE and HPA is fairly constant from group to group. 

The use of the technique of partial correlation changes the values 
of the coefficients slightly, but does not affect the conclusions above 
noted. The coefficient ri. for Group I should be noted in particular, 
since it indicates that for liberal arts students the relation between SI 
and HPA holding ACE constant is very nearly as high as is the correla- 
tion between ACE and HPA holding SI eonstant. 

Multiple correlations between HPA-2 and the best weighted 
combination of SI and ACE range from .724 for Group I to .554 for 
Group III. The increase in the efficiency of prediction can best be 
realized from a study, not of the multiple correlation coefficient itself, 
but of k, the coefficient of alienation, which represents the ratio of the 
standard error of estimate of the predicted score to the standard 
deviation of the criterion measure. This ratio gives a direct measure 
of the error of the predicted value as a percentage of the error resulting 
from an unguided guess. Table V gives for each group these ratios 
for the prediction without any regression equation (row 1), for the 
prediction from ACE alone (row 2), and for the prediction from ACE 
and SI (row 3). 


TaBLeE V.—CoeEFFICIENTS OF ALIBNATION HPA-2 








Prediction| Group I Group II Group III | Group IV | Total group 
ks 1.000 1.000 1.000 1.000 1.000 
kas . 824 . 760 . 859 . 793 .812 
K4.12 . 690 . 728 . 833 .781 .778 




















Study of Table V confirms the conclusions reached from the dis- 
cussion of Tables III and IV, that the use of the SI scale considerably 
increases the accuracy of prediction of HPA over that obtainable 
from ACE alone for students in Group I, increases it to a certain extent 
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in Groups II and III, and not at allinGroupIV. For the Total Group 
of University of Florida students the increase in the accuracy of pre- 
diction from the SI Scale is very slight. 


SUMMARY AND CONCLUSIONS 


In an effort to validate the Scholastic Interest Scale for use with 
the Strong Vocational Interest Blank under new geographical and 
cultural conditions, and under varying curriculum interests, two hun- 
dred forty students of the University of Florida who had filled out 
the Strong Blank during their freshman year were selected to give a 
sample representative of the distribution of ACE test intelligence in 
the freshman class of the University of Florida. For these men the 
following data were collected: Curriculum of registration; honor-point 
average for the first semester, and for the freshman year; standard 
score on the American Council on Education Psychological Examina- 
tion, Total Score; and Scholastic Interest Score. The subjects were 
then grouped by curriculum into four groups, typically A. B., B. S., 
Engineering, and Business Administration. For each group were 
computed separately the means and standard deviations of each 
variable, and product moment coefficients of correlation between: 


— 


. Honor Point Average first semester and Scholastic Interest. 

. Honor Point Average first semester and Psychological Examination score. 
. Honor Point Average freshman year and Scholastic Interest. 

. Honor Point Average freshman year and Psychological Examination score. 
. Scholastic Interest Score and Psychological Examination score. 


Ce bv 


o> 


In addition there were computed for each group the partial coefficients 
necessary for the regression equations for Honor Point Average, first 
semester and freshman year, on Scholastic Interest and Psychological 
Examination Score. Coefficients of multiple correlation were com- 
puted for each regression equation. 

The results of the investigation revealed: 

1. The Scholastic Interest Scale is as valid a measure of that part 
of the student’s honor point average which is not due to intelligence 
when used with liberal arts students at the University of Florida as 
when used with the students of Colgate University. It is still valid, 
but less so, when used with students of the pure sciences. 

2. The Scholastic Interest Scale is not a valid measure of honor 
point average when used with University of Florida students in the 
technical schools, or in the school of Business Administration. 
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3. There is a steady decrease in mean Scholastic Interest score 
as one progresses from A. B. students through B. 8S. students and 
Engineers and Business Administration students. The A. B. mean is 
significantly higher than the means of Engineers or Business Adminis- 
tration students, and the difference between the A. B. and B. S. means 
approaches significance. 

4. In predicting first semester honor point average there seems to 
be no difference between B. S. and A. B. groups in the value of the 
Scholastic Interest Scale. 

5. In predicting average for the entire year, however, the Scholastic 
Interest Scale is most valuable for the A. B. group and there is a steady 
decrease in its value from Group I to Group IV. The two preceding 
conclusions are borne out not only by the partial correlation coeffi- 
cients, but by a comparison of the coefficients of alienation. 

6. For those groups where the Scholastic Interest Scale was not of 
value, the relations between vocational interest and grades were 
obtained, and were found to be either insignificant, or due to a high 
correlation between vocational interest and test intelligence. 
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THE REVISED einen. PAPER FORM BOARD 


WILLIAM H. QUASHA 


New York City 
AND 
RENSIS LIKERT 


Life Insurance Sales Research Bureau, Hartford, Conn. 


Of all the tests used in the Minnesota Mechanical Ability Battery,? 
the Minnesota Paper Form Board Test was the only paper and pencil 
test to give satisfactory correlations with a criterion of mechanical 
ability. 

The Minnesota Paper Form Board Test consists of two series 
entitled ‘‘Series A”’ and “‘Series B”’ respectively. Each series is made 
up of fifty-six problems which tend to increase in difficulty. Its 
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(2) A problem from the Minnesota Paper Form (6) A problem from the Revised 
Board Test. Minnesota Paper Form Board Test. 
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advantages, from the point of view of vocational and educational 


guidance and selection, are: (1) It is a group test, (2) Ithasarelatively * 


high validity coefficient (The Minnesota investigators report the 
correlation between the Minnesota Paper Form Board Test and a 
quality criterion of mechanical ability to be 0.52 raw and 0.61 cor- 
rected for attenuation), (3) It differentiates between the various grades 
of proficiency of mechanical ability, and (4) It is inexpensive. 

The method of scoring the Minnesota Paper Form Board Test is 
both slow and unreliable, as is usually the case in subjectively marked 
tests. It is well recognized that whenever the judgment of the scorer 
enters into the correction of the test, there is likely to be variability 
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of attitude from time to time. Moreover, when there are no real 
standards by which to score a test, there will be differences of opinion 
between various scorers. In general, other things being equal, as the 
objectivity of a test is increased, scoring time is decreased, errors are 
lessened, and the scoring cost is reduced.® 

An attempt to eliminate the subjectivity of the Minnesota Paper 
Form Board seemed advisable. In order to do this it was necessary 
to change from a completion to a multiple-choice form of test. This 
modification made it possible to score the test in a simple, rapid, and 
objective manner. In Fig. 1 (a) is shown a problem from the Min- 
nesota Test and in Fig. 1 (b) a problem from the Revised Minnesota 
Paper Form Board Test. 

The Revised Minnesota Paper Form Board Test was developed 
during the period between June, 1932, and May, 1934. A brief descrip- 
tion of the more important steps taken is given below. 


PRELIMINARY REVISION 


Five alternate solutions, only one being correct, were devised for 
each problem of the Minnesota Test. These items were mimeographed 
and formed the Preliminary Revision, the purpose of which was to 
determine whether or not the multiple-choice test measured the same 
trait as did the Original Test. The correlation between scores on the 
Preliminary Revision and scores on the Original was 0.78. The relia- 
bility of the Original Test was 0.72, while for the Preliminary Revision 
it was 0.80. (In the foregoing correlations N = 104.) Thus the 
intercorrelation is as high as the reliability of the tests themselves. 

On the basis of these results and of the fact that the scoring of the 
Preliminary Revision was much easier and more reliable than that 
of the Original Test, it seemed that the Revision could probably be 
effectively substituted for the Original and was therefore worthy of 
being tried in a printed form. 


THE FIRST REVISION 


The recording and scoring system adopted was one which was 
simple from the point of view of both the person taking the test and 
the person doing the scoring. The test is so constructed as to provide 
blanks at the top of the page, in which the answers to each problem 
are to be placed. In the First Revision the answers were placed below 
the problem numbers, but it was found that scoring would be easier 
if the answers were placed above the numbers. Accordingly, in the 
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Final (the Second) Revision this change was made. Sections of the 
test (Second Revision) and the Scoring Key are shown in Fig. 2. 

Two series, “‘Series A”’ and ‘“‘Series B” respectively, were prepared 
to correspond to the two series of the Original Test. 

In testing the First Revision, it was found that the interform relia- 


| bility (Series A vs. Series B) was 0.80 for one series and 0.89 for the 


two series together. (The latter was determined by use of the Spear- 
man-Brown Prophecy Formula.)” ‘The group consisted of one hundred 
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A*1 AS 
Fic. 2.—Section from the Revised Minnesota Paper Form Board Test and section of the 
Scoring Key. 
ninety-two engineering-school freshmen. “One-half of the group was 
given “‘ Minnesota Series A”’ first, ‘‘Revision Series B”’ second; after 
a five-week interval, ‘‘Revision Series A’’ third, and ‘ Minnesota 
Series B’”’ last. The other half was given “‘ Minnesota B,” “Revi- 
sion A ;’’ after a five-week interval, ‘‘ Revision B” and ‘‘ Minnesota A.” 
It was also found that the interform reliability of the Minnesota Test 
was 0.79 for one series. (It can be noted in passing that this checks 
with the correlation 0.80 reported by the Minnesota investigators.) 
The coefficient of correlation between the Original Test and the Minne- 
sota Test was again used as an index of validity, because, as pointed out 
above, this indicates to what degree the two tests are similar as meas- 
ures of the same trait. This coefficient was found to be 0.75 (corrected 

for attenuation it is 0.94). 

One of the criteria used to validate the original Minnesota Paper 
Form Board Test was mechanical drawing. The Minnesota investiga- 
tors report’ the correlation to be 0.33. As a further check on the 
validity of the First Revision the term grades of freshmen engineers 
in mechanical drawing and descriptive geometry were correlated with 
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their test scores on the First Revision. The coefficients of correlation 
were 0.49 with mechanical drawing and 0.32 with descriptive geometry. 

From norms established for groups of high-school seniors and 
college freshmen, it was evident that one series of the First Revision 
was easier than the other, and it was suspected, furthermore, from 
the somewhat irregular norms, that the items were not arranged in 
order of difficulty. The problems in both series of the First Revision 
were tested for order-of-difficulty. This was done by giving the test 
with no time limit to two groups. One group consisted of one hundred 
ninety-three college students; the other consisted of one hundred 
thirteen high-school students. The proper experimental precaution 
was taken to alternate the series among the subjects. The number of 
times each problem was done incorrectly was determined. The coeffi- 
cient of correlation between the items as originally placed in the test 
and their rank in terms of difficulty was found to be 0.56, indicating 
that the problems were not arranged in proper order. 

Each problem was tested for internal consistency by computing a 
bi-serial coefficient of correlation.* It was found that all problems 
correlated satisfactorily with the total score. 

For the following reasons, it was deemed advisable to reconstruct 
the First Revision: 


(a) The reliability was not as high as was desired. 

(b) The two series were not equal in difficulty. 

(c) The format of the First Revision made it possible for the dishonest to 
work problems before the starting signal was given.’ 


(d) There were not sufficient sample problems to permit the less able to 
know exactly what was required of them. 


(e) The order of the problems was not in terms of increasing difficulty. 


THE FINAL REVISION 


To have all the problems hidden from view necessitated the inclu- 
sion of a cover leaf on the test booklet. This, of course, required six 
pages instead of four. One page of the additional leaf was utilized 
for blanks requesting personal information and the other for directions 
and eight sample problems. The sample problems are the same for 
both series. | 

This arrangement also provided space for eight new additional 
problems in each series. These supplementary problems had been 
prepared and printed on a separate sheet of paper, and were given 
along with Series A and Series B to the one hundred ninety-three col- 
lege and one hundred thirteen high-school students mentioned above. 
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The order of difficulty and the bi-serial coefficients of correlation were 
likewise obtained for these problems. 

The entire one hundred twenty-eight problems* were ranked in 
order of difficulty regardless of their source. In order to obtain two 
equivalent forms, these problems were divided into two new Series 
(AA and BB, respectively) in accordance with the following plan:° 





























Series AA Series BB rt 
— 1 
2- — | Mean of 
3 —< the first 
¢ 4 five problems 
4 8 
Mean of gen 
the second 7 = 
+ 8 — 
five prob- ' — * 
lems 10 
etc. to 128 


Fia. 3.—Method of ranking problems in terms of difficulty. 


In every group, to our knowledge, where both Series AA and 
Series BB have been given, the norms for the two series have been 
found to be practically identical. 

A time limit of twenty minutes for each series was set for the Final 
Revision as contrasted with a twelve-minute time limit for the First 
Revision in order to obtain (1) a normal distribution of scores with a 
large sigma and a wide range, and (2) a higher reliability." 

The interform reliability of the Final Revision is 0.85 for one series 
(0.92 for the two series by using the Spearman-Brown formula), based 
on results of two hundred ninety high-school seniors applying for 
admission to New York University. Motivation, comprehension of 
instructions, and mechanics of administration were all excellent. The 
sigma for this group was 9.2 for Series AA and 11.1 for Series BB. 

Norms have been established on several groups. The medians and 
quartiles for the various groups are shown in Tables I, II and III. 
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Taste I 
Boys Girls 
Age in years 
N Q; Md Q; N Q: Md Q; 
9 108 12 18 25 98 12 18 23 
12 96 31 38 44 99 28 36 43 
15 101 26 32 38 123 28 38 43 








) 
| 
: 








i 


ay 
ait 
F 


202 The Journal of Educational Psychology 


Table I shows Medians and Quartiles for Elementary (Public) 
School children based on a Twenty-five-Minute Time Limit.” 

Table II shows Medians and Quartiles for Elementary (Public) 
School children based on a Twenty-Minute Time Limit.'* 

















Tasie II 
Boys Girls 

Age 

N Q: Md Q; N Q: Md Q; 

10 115 17 22 29 114 16 22 29 

11 77 19 25 33 68 16 23 30 
Grade 

Fourth...... 122 15 22 28 110 13 19 26 

D 135 19 25 31 135 18 23 30 

D cat +: 100 20 27 34 





























Table III shows Medians and Quartiles for various groups based 


on a Twenty-Minute Time Limit. 


























Tanz III 
Reference 
Group N Q: Md Q: —— 

Engineering school students. 1s 

First year (freshman)............... 344 33 43 48 

Third year (middlers).............. 145 40 47 52 

Fourth year (jumiors)............... 212 41 47 52 

Fifth year (seniors)................. 238 41 46 51 
Liberal arts college freshmen........... 247 33 38 44 
High school seniors................... 1288 33 39 45 us 
Printers apprentices.................. 173 33 39 44 
Adults. 17 

Male, sixteen to twenty-five years... .| 147 26 34 39 

Male, twenty-six to sixty years...... 76 23 31 36 

Female, sixteen to twenty-five years. .| 129 25 34 40 

Female, twenty-six to sixty years....| 84 22 32 37 
slik acmnineaedibegene 100 23 31 37 is 

CONCLUSIONS 


As a result of this investigation, a multiple-choice form of the Paper 
Form Board Test of the Minnesota Mechanical Ability Battery has 
been prepared, tested, and standardized. This new test is called the 
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Revised Minnesota Paper Form Board Test. ‘The two series of the 
Final Revision are equal in difficulty, internally consistent, highly 
reliable, and valid. 

The Revised Test has definitely improved on the Original Minne- 
sota Test in the following points: (a) The possibility of dishonest 
persons starting before the signal is minimized; (b) Practice-problems 
have been included to aid those taking the test to understand what is 
required of them; (c) The directions are simplified and the test made 
self-administering; (d) The scoring of the Revised Test is simpler, more 
rapid, and objective. In contrast to the rate of marking the Original 
Test at ten tests per hour by a skilled person, the Revised Test can 
be marked and scored by less competent individuals at the rate of at 
least sixty tests per hour. 

The Revised Test has been found to have a higher reliability than 
the Original Test. The Minnesota investigators reported the relia- 
bility of the Original Test to be 0.80. In this investigation the relia- 
bility of the Original Test was found to be 0.72 and 0.79. The 
reliability of a single series of the Revised Test is 0.85, and 0.92 when 
both series are administered. 

The Revised Minnesota Paper Form Board Test appears to be a valid 
substitute for the Original Minnesota Test. This is shown by the fact 
that the lowest correlation secured between the two tests is 0.75 which, 
corrected for attenuation, is 0.94. 

Further evidences of the validity of the Revised Test are: 


(a) The higher test scores received on the Revised Test by students in 
engineering and allied mechanical vocations than those received by non- 
mechanical groups. 

(6) The relatively high positive correlation between scores on the First 
Revision and success in mechanical drawing and descriptive geometry. 


Norms have been prepared for a number of different groups and 
additional norms on other groups are in preparation. Since the 


norms for the two series are identical, either series may be used 
interchangeably. 
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For a further discussion of the necessity for objective scoring techniques, see 
Paterson, et al.; Op. cit., supra, note 2, p. 30. 


. Henceforth ‘‘ Minnesota Paper Form Board Test’’ will be used interchangeably 


with ‘‘Minnesota Test” and ‘Original Test,” while “The Revised Minne- 
sota Paper Form Board Test” will hereafter be called “‘The Revised Test” 
or ‘The Revision.” 

Paterson, et al.: Op. cit., supra, note 2, p. 430. 


. Bi-serial coefficient of correlation. For formula see Dunlap and Kurtz: 


Handbook of Statistical Nomographs, Tables and Formulas. Yonkers, N. Y.: 
World Book Co., 1932, p. 42. For discussion of its application see Quasha: 
Op. cit., supra, note 1, p. 42. 


. The test was so constructed that the top half of the first page was devoted to 


directions and a sample problem, and the bottom half consisted of the first 
eight test problems. 

Combining the sixteen additional problems with the fifty-six in both series A 
and Series B. 


. This method was used in preference to an odd-even procedure (i.e., problems 


1, 3, 5, etc. to Series AA and problems 2, 4, 6, etc. to Series BB, which would 
have resulted in Series BB being slightly more difficult than Series AA. 

This relationship was found to be true in groups C, D, F, and in the first 
two hundred ninety cases in group E (paragraph 6 infra). 

For a discussion of the effect of increasing the time limit on reliability see 
Paterson, et al.: Op. cit., supra, note 2, pp. 28-29. 

Based on data supplied through the courtesy of Henry E. Garrett, Alice I. 
Bryan, and Ruth E. Perl from tests given to children in the public schools of 
Kearny, New Jersey. See Garrett, Bryan and Perl: ‘‘The Age Factor in 
Mental Organization.”’ Archives of Psychology, No. 176. 

All except sixth-grade girls were supplied through courtesy of Charles McD. 
Morris. Norms for sixth-grade girls are based on data supplied through the 
courtesy of 8. Jean Wolf. See Wolf: A Comparative Study of Two Groups 
of Girls of Relatively Equal Intelligence but Differing Markedly in Achievement. 
Unpublished Ph.D. thesis in New York University Library, 1936. 

Tests administered at the East Side Vocational High School, New York 
City, by Aurelia Cannavo under the supervision of J. Edward Mayman. 

All except one hundred twenty cases in the freshman group were based on 
data supplied through the courtesy of Stanley G. Estes, Northeastern 
University, from tests given to eight hundred nineteen students in the 
College of Engineering, Northeastern University, Boston. The one 
hundred twenty cases were given at the College of Engineering of New 
York University. 

Based on data supplied through the courtesy of Charles McD. Morris. 

Based on data secured from clients of the Adult Guidance Bureau, New York 
City; supplied through the courtesy of Layton Hawkins, J. Edward May- 
man, Cornelia Beall, Joseph N. Feuerburgh, Aurelia Cannavo, and S. M. 
Blumenthal. 

Based on data supplied through the courtesy of Irving Lorge, Teachers 
College, Columbia University, from tests given to CWA workers. 
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THE REDUCTION OF DATA SHOWING NON-LINEAR 
REGRESSION FOR CORRELATION BY THE ORDINARY 
PRODUCT-MOMENT FORMULA; AND THE 
MEASUREMENT OF ERROR DUE TO 
CURVILINEAR REGRESSION! 


THOMAS V. MOORE 
Catholic University of America 


Psychology shares with all sciences the difficulty of obtaining abso- 
lute scales of measurement. We have in our science no absolute 
standards. If we measure an ability by one scale, it might just as well 
be measured by some other scale. 

When we come to the problem of the correlation of human abilities, 
this problem begets various difficulties. One derives from the fact 
that linear or curvilinear regression is a function of the scale of 
measurement. 

Consider for a moment Fig. 1. Let us suppose that ability Y has 
a linear homogeneous scale of measurement along the line OA, and 
ability X such a scale along the line OB. Then the two abilities are 
related by the regression line OP;. Let now the y-scale be bent back 
along the circumference of a circle with radius CO and equal segments 
of the circumference projected along the line OA. The two variables 
are now related by the function y = sin z, and the points of intersection 
of the z and y values give us the curved line of regression OP;. There 
is still the same perfect correlation between all the z and y values, but 
if we were to attempt to correlate them by the ordinary product- 
moment formula, our correlation would fall short of unity. It amounts, 
as a matter of fact, to 0.96. 

It is easy to see that perfect linear regression requires either of two 
things: 

(a) That a unit anywhere on the scale be equal to a unit anywhere 
else on the scale, e.g., that all test items be of equal difficulty, or 

(b) That the progression in difficulty be at the same rate in the 
two measures correlated. This would mean that the two scales be 
bent in opposite directions with the same curvature and degree of 
bending. We then have e.g., sin z = sin y and no longer z = sin y, 





‘This paper was read at the December, 1935 meeting of the Psychological 
Section of the American Association for the Advancement of Science. It is now 
published with some additions due to the kindly suggestion and criticism of 
Doctor Lorge of Columbia and Professor Furfey of the Catholic University. 
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which obtains when only one scale is bent along the circumference of a 
circle. 

An approximation to the condition often obtains in psychological 
measures in as much as the items in all the tests used may be arranged 
in an ascending order of difficulty. 

When dealing with the intercorrelation of a number of variables 
we will, from time to time, come across one or more that give curvi- 
linear regression. The eta-values usually calculated in such situations 
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| Figure! Bent scat and curvilinear regression 






































are not comparable with r-values, and, furthermore, for reliable eta- 
values we must have a relatively large number of cases. It would be 
best to lay down the requirement that in any system of intercorrela- 
tions the scales of measurement be such as to give linear regression 
between all combinations of the variables involved. The technique 
of doing this, with relative simplicity, is the problem of the present 
study. 

Let us consider now Fig. 2. The curve indicates a regression line 
through the mean values of the y-variable. The brackets indicate 
the distance between the means and a point on the ordinate produced 
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to where it meets the chord of the regression line. If a value given 
by the length of the bracket be added to every y-value in the correspond- 
ing column, the mean for the column will fall on the straight line 
which, in the figure, constitutes a chord drawn through the terminal 
points of the regression line. When this is done for all columns, the 
scale of measurement in y will have been changed so that all means 
fall on astraight line. The scatter about the individual means remains 
absolutely unchanged; and so we have not tampered with the observa- 
tions other than to change the scale of measurement in one of the 
variables. Thus corrected, the correlation in question rose from 
8 to .8969. 

Any other straight line other than the chord will do just as well.' 
All such straight lines passing through the origin would merely relate 
the old y to the new y’ by the relation y = ky’, and the correlation 
will remain unchanged. 

Consider for a moment the following group of lines in Fig. 3. The 
regression of any one of the straight lines may be changed to that of 
one of the other’s by adding to the ordinates of one the values indicated 
by the brackets. This is true also of the curved line. In the case of 
the straight lines, the ordinates on one straight line are constant multi- 
ples of those of another straight line. This is not true of the ordinates 
of a straight and a curved line, but a curved line may be projected on 
a straight line and this means a mere changing of the original scale of 
measurement. 

In correlational problems we have two regression lines and, there- 
fore, two possibilities in correcting for curvilinear regression. 

In the old way of calculating eta-values it was generally suggested 
to obtain the two eta-values and take their mean. One could do the 
same in the method here outlined; but, in general, it would not be 
advisable. 

It is evident that if we knew that curvilinear regression were due 
to the bending of only one of the two scales involved, let us say the 
y-values, we would correct only the y-values by projecting them on 
the chord of their regression line and calculating the product-moment 
correlation from the new values thus obtained. 

It sometimes happens that from the nature of one variable we can 
form a reasonable assumption as to which scale is deformed and which 
is a homogeneous linear variable. Thus, time, as measured in days 





1 A crude method of attaining linear regression is given by Kelley in his Statis- 
tical Method, pp. 185-191. 
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and years, represents a homogeneous linear variable. The revolution 
of the earth about its axis goes on from hour to hour at a constant 
rate. Its angular velocity is constant. When, however, we relate 
age in years to growth in physical or mental characters, we usually 
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Figure 3 Projection of curved and straight regression lines 
on each other 
































get a curvilinear regression for the whole period of growth from its 
earliest period to maturity. It is evident, then, that in correcting 
such regressions for curvilinearity we would leave the time variable 


untouched, and project the means of the ability measured on the chord 
of the regression curve. 
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Figure4 Unbroken line corrected, broken 
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Again, when only one of a series of variables gives curvilinear 
regression and all combinations of the others linear regression, there 
is only one variable to be corrected. 

It is clear, however, that both scales may be bent and it may be 
that we cannot determine from the data at hand whether both manifest 
bending; or, if only one does so, we may be unable to decide which of 
the scales must be conceived of as bent. In such a case we might 
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attempt to approximate the true value by correcting both regression 
curves and taking the mean value of the two correlations thus obtained. 
Or, better still, correct both variables and obtain one correlation from 
the two series of corrected values. 

One might follow the rule, in general, that if one variable approxi- 
mates a normal distribution and the other does not, we correct for the 
one that departs definitely from a normal distribution. One finds 
empirically that the method of correction here suggested will at times 


change a markedly skew distribution into one that approximates 
symmetry. 
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Consider Fig. 4 and Fig. 5 which show the correction of the z- and 
y-variable in the curve in Fig. 2.!_ In both instances the very much 
skewed distributions have approximated symmetry. 

We may now ask: What amount of departure from linearity of 
regression will seriously affect a correlation coefficient? To throw 
light on this problem we calculated the coefficient of correlation 
between the ordinates and abscissa of a circle with its center on the 
z-axis and tangent to the y-axis. This is illustrated in Fig. 6. 
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figure 6. Reduction of correlation due fo curvilinear regression 











Forty-five values were used in each correlation. As we go around the 
circle, the correlation decreases as shown by Fig. 7. 

Regressions with curvature approximating a semicircle are unknown 
in psychology. They are usually much less than 90°. When they 
are less than 45°, the reduction due to curvilinear regression is usually 
lost in the experimental error. 

When the regression curve is y = z*, there is surprisingly little 
reduction in correlation due to curvilinearity. Unit correlation is 
reduced for this curve to about 0.96-0.97. 


1 To exemplify the problem the data for curvilinear regression were taken from 
Garret’s Statistics in Psychology and Education, p. 207. 
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Kazutaro Yasukawa! made a theoretical investigation of the error 
to be expected due to non-linear regression. He considered the two 
problems, what is the difference between r,, and rus; and ray and Tuw 
when u and w are such functions of z and y as u = kz’; u = k/z; 
w = kr/z; w = k log z; u = kz*; w = k’ log y, ete. 
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Figure 7 Fallof correlation with degree of curvature 


He found that within the limits likely to occur in anthropological 
measurements that the correlations of such functions of the original 
variables are exact within the limits of ‘the probable errors of most 
correlation coefficients.” Thus, he found a correlation between height 
of father and height of son of .515 + .015. Then by cubing the stature 
of the sons he found a correlation of .514 + .015. But if one fits a 
normal curve to the transformed data in the usual manner, the results 





: “On the means, standard deviations, correlations and frequency distributions 
of functions of variates.” Biometrika, Vol. XVII, 1925, pp. 211-237. 
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are not so good and errors may run up to seven per cent (or perhaps 
one per cent of the total frequency). 

Furthermore, it is evident that if one supposes a curvilinear regres- 
sion to be linear and uses the ordinary regression equation to predict 
an individual score, considerable error will result in those regions where 
the curve deviates furthest from the straight line. If, however, one 
employs the method of correction used in our text, it would be an easy 
matter to pass from the linear scale to the curvilinear scale of the 
original measurements. 

In 1925 Karl Pearson suggested a modified formula for the calcula- 
tion of r from deviations from the means of arrays as n (eta) is cal- 
culated. He pointed out that “the disadvantages of the formula are 
that it applies only to normal distributions of frequency, and therefore 
to linear regression and Gaussian variability. Practically, however, 
these limitations are less important than might be anticipated ... 
and we do not get results substantially worse when the material is 
markedly skew than when the material is practically normal.””! 

It would thus seem that in many cases psychologists need not fear 
a serious clouding of their results due to the use of the product-moment 
formula in non-linear regression. There is, however, a relatively 
simple method of determining the amount of reduction in a correlation 
coefficient due to non-linear regression which should be employed when- 
ever the curve of the means indicates a marked curvilinear regression. 


SUMMARY 


Scales of measurement determine lines of regression. 

Convenience is the major reason for adopting one scale of measure- 
ment rather than another. 

For many purposes in psychology it is convenient to have linear 
regression between pairs of variables. 

When regression departs significantly from linearity, the text gives 
a simple method of changing the original scale of measurement so as 
to attain linear regression. 

In very many cases this correction will be superfluous, the error 
due to curvilinear regression lying within the limits of the probable 
error. 


1“‘On first power methods of finding correlation.’”’ Biometrika, Vol. XVII, 
1925, pp. 460-461. 
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REVERSAL ERRORS IN READING: PHENOMENA OF 
AXIAL ROTATION 


DAVID WECHSLER AND MYRTLE L. PIGNATELLI 


Bellevue ——— eet Modicine York University 

In recent years investigators have laid increasing emphasis on the 
importance of the type and character of errors committed by indi- 
viduals suffering from reading disabilities. Already, there is quite 
a substantial literature on the subject,' a large part of which is devoted 
to an attempted classification of errors either from a theoretical or 
practical point of view. These classifications often differ considerably 
from one another, but all of them contain a category termed “‘rever- 
sals.” It is this category, in many ways the most important, which 
we propose to discuss in the present paper. 

It is necessary to begin with a definition of terms. Taken literally, 
the word “‘reversal’’ means “a turning over,” and in the case of reading 
errors is expected to mean a turning over or reorientation of a letter 
or group of letters (word) about a particular axis. As actually used, 
however, it has come to include much more as may be seen even from 
a cursory examination of illustrative examples. The partial list of 
“reversal”’ errors culled from various authors* as well as our own 
material is shown on page 216. 

The foregoing far from exhaust the “errors” that have been listed 
as ‘‘reversals,”’ but even from this relatively small list it is apparent 
that the term ‘‘reversal” is used to designate a variety of different 
phenomena. A number of investigators, among them Monroe,’ have 
attempted to bring some order into their appraisal by classifying into 





1 A good review of this literature will be found in the recent article by J. Jastak, 
“Interferences in Reading.” Psych. Bull., Vol. XXXI, No. 4, April, 1934, 
pp. 245-272. 

* Orton, 8. T.: ‘A Physiological Theory of Reading Disability and Stuttering 
in Children.”” New England Jour. Med., 1928, pp. 199, 1047. 

“An Impediment to Learning to Read.“ School and Society, Vol. XXVIII, 
1928, pp. 286-290. 

Monroe, Marion: ‘‘Methods for Diagnosing and Treatment of Cases of 
Reading Disability.” Genetic Psychology Monographs, Vol. IV, 1928, p. 375. 

Monroe, Marion: Children Who Cannot Read. Pp. 34-38. 

Gates, Arthur I.: Improvement of Reading. New York, 1935, Appendix I, 
Test VIII, p. 521. 

* Monroe, M.: Children Who Cannot Read, Chicago, 1932. 
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various groups. Monroe, for example, divides them into three main 
groups: 

1. Reversed orientation of letters, as when b = d:p = q:n = w: 
are interchanged. 

2. Reversed sequence of letters, as when no is read as on, and saw 
is read as was. 

3. Reversed sequence of words, as when child reads There once was 
instead of Once there was. 


p read b k read f T read J Q read D 
b read d g read y G read O F read E 
p read g y read u W read M J read L 


h read y Z read N Y read U 
m read w H read N 
am read ma was read saw 
on read no ton read not 
of read fo hen read and 
at read to how read who 
is read in arms read rams 
su read use squirt read spring 
ip read pit left read felt 
net read ten dig read pit 
yo read boy ever read never 
There once was read Once there was 
He came again read He again came 
** Mother,” he said read ‘* Mother,” said he 


If we examine classifications like that above critically, two things 
are, perhaps, immediately obvious. The first is, that the classifications 
are descriptive, and accordingly, though useful for practical purposes, 
tend to throw little light upon the inner nature of the errors. As 4 
matter of fact, some of the categories include subspecies which are 
neither subordinate or capable of being logically subsumed. We will 
show later, for example, that the interchange of b = p, ord = q, is in 
no way similar, to the misreading of n for u. Again when a child reads, 
“There once was” for “‘Once there was” we are not dealing with 
reversals in the true sense, but merely with the displacement of words. 

Perhaps the fullest elaboration of the significance of reversals will 
be found in the studies of Orton in whose theories of reading the factor 
of left-right sequences play a dominant role. Orton distinguishes two 
kinds of reversals—those involving ‘‘vertical disorientations” . . . 
as when wu is mistaken for n, and p for b; and those involving right and 
left disorientations as when d is mistaken for b, and pforg. Of these 
findings the latter appear to be the most common, and while he also 
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mentions other ‘‘ cross disorientations,”’ it is the sinistral dextral rever- 
sals which play the basic réle in his theory of strephosymbolia. 

It is not our purpose to enter at this time into any discussion of 
reading theories or the relation of reversal errors to reading disabilities; 
the aim of this paper, rather, is to call attention to a variety of spatial 
disorientations in the configuration of letters. In this connection, the 
first point to be emphasized is the one made at the outset of this paper, 
namely, that the term “‘reversal’’ as used at present, is too vague to be 
of scientific value. As now used in the classification of reading errors, 
it is a supposed change in spatial orientation of a letter or group of 
letters which cause them to be confused with another letter or word 
because of an acquired identity or similarity in appearance with it. 
When, as a result of such disorientation, a letter or word is read for 
another, the misreading in question is called a reversal. The general 
stricture which may be made to such a definition is, that it tells us 
very little of what has really happened. 

An examination of even the brief list of ‘“‘reversal errors” given 
on page 216, shows that the term “orientation”’ covers a variety of facts. 
For example, when 6 is read as d the visual phenomena involved is 
quite different from that taking place when n is read as u, and con- 
siderably less complicated than what occurs when WN isreadasZ. The 
process common to all three is that the confused letter in each case 
has been rotated about some imaginary axis: b to be changed to d has 
been rotated about its vertical axis: n when changed to u has been 
rotated about its horizontal axis: and N to produce Z has been rotated 
about its depth axis. Furthermore, while b and n required a rotation 
of 180°, N needed only a rotation of 90°; in fact, had it been rotated 
180° it would have: resumed its original form, or become N again. 

It thus appears that in the analysis of reversals one has not only 
to deal with the fact of rotation, but one must also consider the plane 
in which this rotation takes place, and at times, also, the angular 
distance involved. In the studies up to the present most authors have 
assumed that the rotation occurs primarily around one of the axes, 
namely, the vertical axis. At least, that is the implication of their 
predominant preoccupation with mirror writing and the so-called 
sinistro-dextral inversions. Actually the rotation may take place in 
any one of the three spatial planes, giving rise to inversions about the 
horizontal and depth axes as well as about the vertical axis. In any 
case, the analysis of different ‘‘reversal’’ errors show that in order to 
explain them intelligibly we must have recourse to a rotation not only 
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about one or two but all of the three axes, and sometimes to a com- 
bination of two of them. Thus, to use some of the familiar examples 
again: b = d when rotated about its vertical axis: u = n when u is 
rotated about its horizontal axis; N = Z when rotated about its depth 
axis. 
But sometimes an identical reversal may be achieved through 
different axial rotations. Thus one may account for the common 
b = gq “error,” either as a clockwise 180° rotation about the depth 
axis or by a double rotation, thus; first about its horizontal axis which 
gives b = p, then again about its vertical which gives the required gq. 
Similarly for y = h, either rotate the letter 180° counter clockwise, or 


rotate it first along its vertical axis (which gives h = d ), and then 
again about its horizontal axis which gives the desired y. Further- 
more, it must be remembered that rotation along different axes with 
the same letter will cause different types of errors. Thus d becomes 
b when rotated vertically, but g when it is rotated on its horizontal axis. 
On the other hand the same final “error” will occur by the rotation 
of different letters in different axes; thus p = d when rotated on its 
horizontal axis and so does 6 when rotated on its vertical axis. 

This last fact may account for the view that most reversals are 
sinistro-dextral, a legitimate confusion. On the other hand, to call 
“‘was”’ a sinsitro-dextral reversal as is frequently done, is incorrect, 


because “‘was”’ reversed does not become “saw” but W 8 2 which 
is entirely different. The new configurations resulting from different 
axial rotations is not immediately apparent. Some facility at visuali- 
zation is necessary, and as an aid to this we give in Tables I and II, 
a record of the transformations effected in the appearance of the letters 
of the alphabet when rotated around their different axes. 

On the basis of a rotational analysis one may venture the following 
classification of reversal errors: 


1. The rotation about vertical axis (right and left reversals), e.g., d = 6: 
p=¢q:Z=S8: 

2. Rotation about horizontal axis (up and down reversals), ¢.g., b = p: 
d=q:M = W:f =t: 

3. Rotation about depth axis (clock and counter-clock reversals), w = ™ 
(script):d = p:Z=N:M = E: 

4. Rotation of letters about two axes (double reversals), ¢.g.,h = y:b = 9: 
(although these may also be gotten by 3). 
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The occurrences of various types of rotations are, of course, not 
equally frequent, but our own experience shows that the up and down 
reversals, contrary to general assumption, are much more frequent than 
left-right reversals. This would be expected from the examination 
of our tables because the number of letters which are transformed by 


TasLe I.—TRaNSFORMATION OF ALPHABET RESULTING FROM Ax1aL ROTATION OF 
_ Lerrers (Carirats) 
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rotation into others which may be easily confused with them, is very 
much greater when they are rotated about the horizontal than about 
any other axis. 

It is, of course, not implied that axial rotation is the only factor 
which enters into the explanation of reversals of errors. A number 
of others, undoubtedly, play a réle, among them the most important, 
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perhaps, being those of perspective and the relation of figure to back- 
ground. Altogether, we must remember that while the printed page 
involves only two dimensions, the reading of it, like all vision, is a 
tri-dimensional process. From the study of illusions we know what 
an important part angular relationships play. It is almost impossible 


TaBLE II.—TRANSFORMATION OF ALPHABET RESULTING FROM AXIAL ROTATION oF 
Letrers (SMALL) 
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to juxtapose any two lines without producing some perspective effects, 
especially when the abutting lines make certain angles. 

Again, there is the matter of fixation. A letter may be easily 
“transformed” into another letter by the simple fact of fixating on 4 
particular part of it as, for example, when A is read as n. This may 
easily be achieved by fixating strongly on the lower half and thereby 
suppressing the upper part. 
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In other cases, several factors may enter into the combination to 
effect the perceived error. Thus when p is called d one may account 
for it on the basis of a clockwise rotation about the central axis, but 
one gets the same effect more directly by focusing on the central part 
of the letter and fixating strongly on the figure rather than on the back- 
ground. When this is done the orientation of the letter on the paper 
is disregarded and the letters p and d are perceived as identical, which 
in fact, they are, when the background is disregarded. 

The above conclusion will, perhaps, be sufficient to show how much 
more complicated is the problem of ‘‘reversals” than is implied by 
the statement that it consists of a change in orientation. We have 
in this paper confined ourselves first, to pointing out the importance 
of rotation as a factor in tri-dimensional vision, and, secondly, to a 
part analysis of reversals in terms of formations in letters of the alpha- 
bet when rotated along different visual axes. The problem is even 
more complicated when instead of considering letters alone we take 
up groups of letters (words). At best, it only explains a minor part 
of the errors that occur. The transformations that occur as a result 
of rotation of different axes help to account for some, but much more 
work will be necessary to explain all so-called reversal errors completely. 
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THE EFFECTS OF PRACTICE ON INTELLIGENCE 
TEST SCORES! 


DOROTHY C. ADKINS 
The Mooseheart Laboratory for Child Research 


‘It has been the claim of many persons engaged in the testing field 
that practice has little or no effect on the scores on intelligence tests, 
If it be true that practice plays little part in the scores on re-tests, 
one can safely interpret what increases are found as due to growth 
in the function being tested. “As an illustration of such a claim, we 
may cite Kuhlmann and Anderson (4, p. 11),? who, in explaining why 
#@ a second form of their intelligence test is unnecessary, say: ‘‘ There 
is no evidence that any significant amount of practice effect remains 
after a year’s interval.” Hence they proceed to recommend the use 
of the same form of their test to determine the amount of progress 
children make in mental development from year to year. 

That Thorndike early recognized the possible effects of practice on 
intelligence test scores is evident in an article published in 1919 (7). 
Here he recommended the construction of alternative forms of such 
tests, equal in difficulty but varying in content. Even with different 
content, howeverhe found a median gain of 10 per cent for second 
trial scores over first and 4 per cent for third over second, the trials 
being given in immediate succession. He suggested the use of fore- 
exercises, though he pointed out that if the effects of speed and pre- 
cision in understanding a novel task were thus equalized, the test 
might lose much of its symptomatic power. 

“In 1923, Thorndike explicitly pointed out that the data from 
repeated measurements need some allowance for the special practice 
in taking the tests themselves (6). He argued that the result of 
experience with one form of a test, say A, is shown by the average 
difference in scores on another form, say B, in favor of those who had 
taken A the previous year as compared with those taking B as the first 
trial; that is, the practice effect of one trial is shown by the excess of 
second-trial scores over first-trial scores. To find the increase due to 





1 Acknowledgment is gratefully made to Dr. Martin L. Reymert, Director 
of the Mooseheart Laboratory for Child Research, who suggested this study and 
offered constant encouragement. 

2 The numbers in parentheses indicate references contained in the bibliography 
at the end. 
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growth and training without this special practice effect, he subtracted 
from the obtained gains a constant determined by averaging nine 
differences of first and second trials for grades ten to twelve, (although 
the practice effects varied considerably from grade to grade). 

The danger that practice effects may obscure results where the tests 
under consideration are mental tests was recognized by Wooley in 
1926 in her comprehensive study of children from ages fourteen to 
eighteen (8). However, while she exercised caution in the selection 
of mental tests and in interpreting results, she used no special device 
to account for or correct for practice effects. 

In the longitudinal study of growth initiated by Dearborn in 
1923 (2), two group tests of intelligence have been given each year. 
Presumably in order to achieve some degree of comparability of 
successive scores and at the same time to diminish practice effects, 
in most years after the first, one of the group tests used the previous 


year has been repeated and one new test introduced (5). Lincoln ' 


points out that Cattell and Gaudet discovered that the medians for 
three successive performances on the Dearborn General Examination A 
increased with practice and found similar results for other tests (5). 
That this was a complicating factor was recognized by Lincoln. He 
also pointed out the further difficulty which accrues from the facts 
that the median increments do not vary consistently from year to year 
with the general performance level and that some pupils have lower 
intelligence quotients when the tests are repeated, in contrast to the 
general tendency to gain. However, no constructive suggestions for 
removing these difficulties are contained either in the report of Lincoln 
or in the later report of Dearborn on the same material (3). 


Obviously, if practice has no appreciable effects, the problems | 


involved in longitudinal studies of mental growth are greatly simplified. 
If, however, practice does influence the results, as seems likely from 

the previously cited studies, it is highly important that we attempt to 
a the extent of its effect and to devise some means of correcting 
or it. 

The School Survey at Mooseheart provides data sufficient to 
warrant a tentative solution to this problem. One of the features of 
this program, which originated in the school year 1930-1931, is that 
at intervals roughly approximating a year, students are subjected to 
re-tests by the same instruments, among which are three tests of 
intelligence—the Kuhlmann-Anderson, the Morgan Mental, and 
the Otis. All three of these tests were administered to the high- 
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school students each year from 1930 to 1933, while in 1934 only the 
Kuhlmann-Anderson test was used. Since it is planned to continue 
the testing program at Mooseheart over a period of some years, it was 
deemed advisable to treat the available data so as to reveal whether 
mental growth alone is exhibited by successive scores on the same test 
or whether the mental growth factor is contaminated by practice 
effects. 

Distributions of all the raw scores on the Morgan and Otis tests 
from 1930-1931 on, by grade and year, were made. After a pre- 
liminary use of raw scores, mental age scores were finally employed 
for the Kuhlmann-Anderson test, since successive scores are not 
strictly comparable from year to year, due to changes in the tests. 
The distributions were adjusted so as to represent scores for the same 
groups of persons in successive years. From these distributions, 
the means and standard deviations were computed, except for the 
Kuhlmann-Anderson test, where only the means of the median mental 
ages (in terms of which the norms are given) were found. Scores fora 
total of six hundred fourteen children were involved, making a total of 
four thousand seven hundred forty test scores which were included 


in the study. 


It would have been advisable to have these computations for age 
groups rather than for grade groups, but several circumstances pre- 
vented a realization of this possibility. First, such a procedure 
requires a great deal of time; second, individuals of the same ages were 
in different grades, so that the time of testing and the intervals between 
testings varied considerably for those of the same age; third, a number 
of persons of the lower age levels were still in the elementary school, 
where the forms of the tests differed or where the tests were not given 
at all. 

As a substitute procedure, the mean scores were computed for grade 
groups. Then the average age of each group as of January 1, 1931, was 
computed and corrected by a constant representing the approximate 
interval between that time and the average date of testing for the 
group in question. These results—the means, standard deviations, 
sizes of the populations, and approximate average ages at the time of 
testing—are presented in Table I. 

The assumption will be granted, we believe, that if only mental 
growth is involved in increases in score on repeated tests, a random 
group of thirteen-year-old children tested for the second or third time 
should have the same average score on a test as a second random group 
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of thirteen-year-olds taking the test for the first time; that is, the one 
or two extra practices of the first group should avail them nothing. 


Perhaps this assumption can be visualized more clearly from Fig. 1, 
designed for its illustration. The units on the ordinate are purely 
arbitrary ratings on an imaginary test. Along the abscissa, chrono- 
logical age is the variable. The slopes of the curves are also, of course, 
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Fic. 1.—Theoretical chart to indicate the effect of practice as distinguished from growth. 
Numbers indicate which repetition is plotted. 


purely arbitrary and would depend, in a given instance, on the con- 


- struction of the test. If practice as well as mental growth is effective 


in increasing test scores, curves for successive testings should build 
up into a hierarchy as illustrated; if, however, only mental growth is 
effective, all curves should coincide, except for sampling errors, with 
the lowest curve connecting the first-testing scores for successive age 
groups. 

To obtain an idea as to the nature of the actual situation, graphs 
similar to this theoretical one were constructed for each test. The 





1 In constructing the chart, the following assumptions were implicitly involved; 
(1) That mental growth rate decreases with age; (2) that the effect of additional 
practice decreases with age; (3) that the practice effect diminishes with successive 
repetitions. That these assumptions are fairly well substantiated by our data 
will be evident from inspection of the empirical charts later presented. None of 
these assumptions, however, is essential to the general nature of the argument. 
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, outstanding difference is that the theoretical curves assumed random 
( age groups tested at yearly intervals, while for the empirical curves we 
nn had available only groups of a given approximate average age at the 


ne TasLe I.—Averace Scorses, StanparD Deviations, N’s, AND AVERAGE AGE or 
/ GROUPS AT THE TIME OF TESTING, FOR THE KUHLMANN-ANDERSON, Morgan 
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4J MENTAL, AND Otis INTELLIGENCE Tests, For Groups Wuicu WERE 
¥ THE SeventH, E1icuts, Nintu, Tenta, ELEVENTH, AND TWELFTH 
a GRADES IN 1930-1931 anp 1n SuccERDING GRADES IN 
‘ SvuBsSEQUENT YEARS 
J i 
wy Kuhlmann- 
ae J Anderson! 
4 ce Grade | Morgan (total score) Otis (total score) — 
J— | ee MA) 
— year 
ie if 
oe M o Age | M o Age | M | Age 
U 
1 77 | 7,1930| 60.31 | 12.45 | 13.13 | 92.91 | 15.22 | 12.97 | 12.94 | 13.22 
s . 8,1931) 74.40 | 14.92 | 14.09 [115.38 | 18.61 | 14.09 | 13.86 | 14.30 
e | 9,1932) 85.38 | 15.90 | 15.22 |135.05 | 20.08 | 15.25 | 14.76 | 15.22 
| 10 ,1933 16.31 | 16.38 
= mM 80 | 8,1930) 71.12 | 15.98 | 13.32 |105.62 | 19.64 | 13.37 | 13.80 | 13.37 
9 9,1931) 88.91 | 17.94 | 14.40 |132.79 | 21.58 | 14.24 | 15.63 | 14.32 
= 10 ,1932}100.40 | 21.21 | 15.37 [152.25 | 24.58 | 15.37 | 16.59 | 15.37 
= 11,1933 17.28 | 16.53 
92 9, 1930} 80.48 | 18.07 | 14.90 106. 02 21.11 | 13.28 | 15.06 | 14.95 
% ) 10,1931} 98.96 | 18.09 | 15.98 |140.86 | 23.55 | 15.82 | 16.19 | 15.86 
a 2 11 ,1932/107.96 | 21.11 | 16.95 {158.83 | 24.50 | 16.98 | 17.30 | 16.95 
‘cs Ie 12,1933 18.59 | 18.11 
i 121 |10,1930} 87.62 | 18.99 | 16.00 {115.60 | 23.87 | 14.38 | 15.48 | 16.05 
7 ‘i 11 ,1931)104.24 | 23.56 | 17.08 |152.04 | 24.00 | 16.92 | 16.65 | 17.05 
ei { 12,1932/112.22 | 24.70 | 18.17 |162.40 | 25.03 | 18.08 | 17.39 | 18.05 
* 143 11, 190300 95.04 | 18.50 | 17.15 124. 071 23.89 | 14.99 | 16.35 | 17.16 
se 12,1931|109.24 | 22.17 | 18.15 |155.04 | 24.02 | 17.50 | 17.26 | 18.16 
‘4 * 101 whet 91.36 | 19.71 | 18.51 {126.46 | 22.56 | 16.35] 15.79 | 18.52 
& A 1 Mental ages instead of raw scores were used here for the Kuhlmann-Anderson 
| : test as explained in the text. Since the means and standard deviations were com- 
Rs puted originally using raw scores, it was not considered essential for the purpose 
: a — at hand to recompute standard deviations in terms of mental ages nor to publish 
ee those found in terms of raw scores. 
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time of testing, who were in grades seven to twelve, inclusive, in 1930- 
1931, and in the next higher grades in succeeding years. While this 
difference does weaken somewhat our conclusions, it does not provide 
sufficient justification for discarding them, when other factors are 
taken into consideration. 

Figure 2 presents such a chart for the Morgan Mental Test. The 
disconnected lines connect average scores for first testings, for second 
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Fig. 2.—The Morgan mental test; average scores for several groups on repetition of the 
test. 


testings, and for third testings, the data having been drawn from 
Table I. The curve for the third trial is, it will be noted, above that 
for the second, and that for the second above that for the first. The 
curves for repeated testings follow the general contour of the curves 
depicting both growth and practice in Fig. 1. Differences in the 
heights of the ordinates for a given value of the abscissa show the 
effects of one repetition, two repetitions, and so on. 

It will be noted that the eighth-grade group for 1930-1931 has an 
average age only slightly larger than the seventh grade and yet has 
considerably higher average scores. Obviously, the eighth grade is a 
brighter group than the seventh grade of the same year. In order to 
make our results more nearly approach those for random age groups, 
the results for the seventh and eighth grades were combined and the 
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weighted average test score and age were computed for each testing. 
The curve marked seventh and eighth shows the results of this com- 
bination and demonstrates even more clearly that practice is affecting 
the results. The twelfth-grade group in 1930-1931 is considerably 
low in average score for its age on this test, as well as on the other 
two; however, no correction was made for the evident dullness of 
this group. : 

Figure 3 shows the situation for the Otis test. Here, as with the 
Morgan test, the effects of practice are clearly demonstrated. 
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Fic. 3.—The Otis group test of intelligence; average scores for several groups on 
repetition of the test. 
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Results for the Kuhlmann-Anderson test are presented in Fig. 4. 
They are given in terms of mental age rather than raw score because 
of the non-comparability of the raw scores for different grade groups. 
The interpretation here is not very clear, because of the nature of the 
construction of the Kuhlmann-Anderson norms. For a single sub-test, 
the average mental age for a random group of a given chronological 
age should be equal to the chronological age (assuming a normal 
environment). But whether the average median mental age derived 
from a number of such sub-tests will equal the chronological age is a 
question which cannot be answered from the data at hand. If it 
were true, the curve of growth should be represented by a straight line 
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connecting equal mental and chronological ages, instead of by the 
lower disconnected line. Unless it is true that in general the students 
are considerably duller as school grade increases—a supposition at 
variance with customary experience—the writer does not believe that 
such an interpretation can stand. While further research is needed 
to make the conclusion absolute, it seems that here, too, practice is 
effective. 
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Fic. 4.—The Kuhlmann-Anderson test; average median mental ages for several groups 
on repetition of the test. 


If we were to construct norms on an age basis, it would be possible 
to apply an average correction to individual curves formed by connect- 
ing scores on repetitions of the same test. Corrections could be found 
by taking the differences of the averages made by two random groups 
of the same age as the person in question, one having taken the test 
once and the other twice, in the case of the first re-test for the indi- 
vidual. For his second repetition, we could correct his score by an 
amount equal to the difference of the averages made by a group having 
taken the test twice and a group having taken the test three times, 
and so on. This procedure would involve only average corrections 
and might conceivably be improved by taking into account, for a given 





* See discussion p. 9. 
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individual, the average gain made by persons of the same standing on 
the first test as himself or by using some other empirical correction. 
However, even such a simple method as that here suggested may be 
expected to give us a more nearly true picture of the mental growth 
of an individual than will the ordinary custom of simply neglecting 
the effects of practice or of assuming that they are adequately disposed 
of by employing so-called duplicate forms of the same test. 

Inspection of Table I reveals that there is an increase in standard 
deviations with successive testings. Although the method of treat- 
ment and interpretation which has been used is applicable to standard 
deviations as well as to means, the writer considers it necessary only 
to point out such a possibility and the strong likelihood that the 
resulting conclusion would be quite analogous to that involving the 
means—that is, that a part of the increase in range of scores on intelli- 
gence tests is due to practice, indicating that in general the practice 
effect is greater for those persons who made higher initial scores. 

In a previous paper (1), the writer has suggested the desirability 
of constructing age norms for a specific environment where an extended 
longitudinal survey of mental growth is planned. The present research 
serves to emphasize this need. The mental age norms usually pro- 
vided with various tests, being derived in different ways and from a 
variety of populations—sometimes with an insufficient number of 
cases—are not comparable and render the problem of interpretation 
of results more difficult. 

Throughout this study is indicated the desirability of designating 
a given month of the year as a testing period, so that the intervals 
between successive testings will be constant, thus obviating the neces- 
sity of approximating the average age at the time of testing. 

It is quite possible that the appearance of increases in score on 
repeated tests is due largely to the time limits imposed. It seems 
plausible to the writer that in intelligence tests, composed mainly of 
items of the problem-solving sort, the effect of practice may be due 
largely to an increase in speed on the items previously taken (as 
Thorndike suggested, 7) and items of a similar sort rather than to an 
increase in the ability involved in performing correctly operations 
more difficult or “‘non-comparable.” That is to say, there is perhaps 
an increase in speed rather than in power. If this be granted, the 
increases in score on work-limit tests should be due in large part to 
mental growth rather than to practice; hence it follows that for a longi- 
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tudinal study of mental growth, the work-limit method of administra- 
tion of tests perhaps should be given more extensive trial. 


It remains to be pointed out that both the rate of mental growth * 


and the amount of improvement with practice are highly individual 
factors. While it is characteristic of statistical studies that they deal 
in averages, it cannot be over-emphasized that the general result 
cannot be expected to hold for every individual. Such studies, how- 
ever, do provide us with bases for better “‘guesses”’ with reference to 
the performance of individuals in the absence of further knowledge 
of other factors. On the other hand, it must be recognized that it is 
the individual difference which makes clinical psychology a possibility 


—and a necessity. 
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THE k AND g METHODS OF INTERPRETING THE 
COEFFICIENT OF CORRELATION 


J. D. HEILMAN 
Colorado State College of Education 


One method of interpreting a coefficient of correlation makes use 
of k which is equal to +/1 — r?. This method was introduced by 
T. L. Kelley. k represents the percentage which the error of the 
estimated scores is of the error which results when all of the estimated 
scores are placed at the mean of the distribution. Placing each score 
at the mean of the distribution is equivalent to the best guess. The 
scores on one variable may be estimated from those on another variable 
when the relation between the two variables in terms of a coefficient 
of correlation is known. The regression equation or the line of means 
may be used for this purpose. 

The meaning of k may be illustrated with the aid of the following 
correlation tables: 
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Correlation Table C. 


Correlation Table A represents a perfect correlation between the 
scores of the x and y variables. All of the four cases who received the 
highest score on the z variable also received the highest score on 
the y variable. Those who received the second highest score on the 
xz variable received the second highest on the y variable and so on. 
Therefore, the score made on the y variable by any student may be 
predicted from the score on the x variable without any error. In 
consequence the ratio of the error of the y-scores estimated from the 
z-scores to the error of the scores estimated at the mean of distribution 
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y is zero. When this ratio is zero the agreement between the two 
series of scores is perfect and r is 1.00; or when r is one the improvement 
in the accuracy of the y-scores estimated from the z variable over the 
accuracy of the scores when placed at the mean of distribution y is 
one hundred per cent. 

Correlation Table B represents a zero correlation. In estimating 
the values on the y variable from those on the z variable, the best 
possible procedure is to find the mean of each column because those 
who made a given score on the z variable made all possible scores on 
the y variable. Using the mean of the columns to estimate the value 
on the y variable will yield the mean score of three on the y variable 
for all of the values on the z variable. Therefore, the error on these 
estimated scores is the same as the error due to the best guess which is 
also at the mean of distribution y. In consequence, the ratio of the 
former to the latter or k is equal to one. k then is equal to one when 
ris zero. The agreement between the two series of scores is that of a 
best guess or a zero agreement beyond that of a best guess. When 
ris zero the improvement in the accuracy of the y-scores estimated 
from the z variable over the accuracy of the scores when placed at the 
mean of distribution y is zero per cent. 

Correlation Table C represents a coefficient of correlation of .50. 
As in the case of Correlation Table B the values on the y variable must 
be estimated from the means of the columns. The best estimate for 
the value of five on the z variable is four on the y variable; the best 
estimate for four on the z variable is 3.5 on the y variable and so on. 
These estimated values are in error for some of the cases. For a value 
of five on the x variable the estimated value of four on the y variable 
is accurate for two cases and one point in error for each of the other 
two cases. The errors for all of the estimations may be found by 
taking the differences between the estimated and the actual values for 
all of the cases or frequencies. The SD on these errors which may be 
used to express their extent is .867. The SD of the errors resulting 
from a best guess or estimating all of the values at the mean of distribu- 
tion yisone. Therefore, when r is 50 the improvement in the accuracy 
of the y scores estimated from the z variable over the accuracy of the 
scores when placed at the mean of the distribution is 1 minus .867 
divided by 1 or 13.3 per cent; or the departure from a perfect agreement 
between the two series of scores when a zero agreement is regarded as 
that due to a best guess is 86.7 per cent, or the agreement beyond a 
best guess is 13.3 per cent. 
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Interpreting the Coefficient of Correlation 235 


— oe 
relationship. However, g or x : 2 expresses a relation which 
includes the part neglected by k. g, or more precisely 1 — g, represents 
the relationship which exists exclusive of that due to chance. The 
author of this method of interpreting a coefficient of correlation is 
C. W. Odell. 

The ratio of the error of the estimated values to the error of a mere 
guess is smaller than the ratio of the error of the estimated values to 
that of a best guess. Making use of Correlation Table C the two may 
be compared. The latter ratio has already been found. It is .867 as 
stated above. To find the former ratio use may be made of the error 
of the estimated scores, .867, which was computed to find k. The 
error due to a mere guess may be found in Table I. 

In Table I the distribution of scores as given in Correlation Table C 
was used to compute the amount of error in the scores when they were 
assigned on the basis of a mere guess; 7.e., it was assumed that the 
sixty-four students involved made scores as given in the distribution, 
but the assignment was made by placing the scores in random order in 
a hat and having the students draw their scores in the alphabetical 
order of the initial letters of their last names. 

The amount of error on the chance assignment is expressed in 
terms of the standard deviation which was found to be 1.4142. The 
amount of error, as expressed in terms of the standard deviation, on 
the scores of the y variable estimated from the z variable is .867. 
The ratio of the latter to the former value is .6128. This is an expres- 
sion of the percentage of departure from a perfect correlation for a 
coefficient of .50 when a mere guess is regarded as a zero relationship. 


om ae 
The value obtained from the formula 4 Fe 5) : is .6121 which agrees 


with the ratio value of .6128 to the third place. 
If then it is desired to know the percentage of departure from a 
perfect agreement between two series of scores beyond a mere chance 








ae 
agreement for a given coefficient of correlation the formula J : 2 





— 





should be used. Moreover, 1 * or in this case .488, is an 


expression of the per cent of improvement in the estimated scores 
over a chance estimate. 

Because, for Correlation Table C the SD for the errors of the scores 
estimated at the mean is one and the SD for the errors of the scores 
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when assigned by chance is 1.414, we may find how much better in 
terms of percentage a best guess is than a mere guess. This may be 
found by dividing 1.414 minus one by 1.414. The result is 29 per cent. 

In conclusion, it may be said that when a best guess relationship 
between two series of scores is regarded as zero then k is the percentage 
of disagreement and 1 — k the percentage of agreement between the 
two series, but when a mere chance relationship between two series is 
regarded as zero then g is the percentage of disagreement and 1 — g 
the percentage of agreement between the two series. 








> in 
’ be 
nt. 
hip 
age 
the 
s is 
ae 


SHORT-CUT METHODS FOR CALCULATING RAW AND 
CORRECTED CORRELATIONS BETWEEN A 
COMPOSITE VARIABLE AND ITS COMPONENTS 


EDWIN E. GHISELLI! 


Harvard University 
AND 
GEORGE KUZNETS 


University of California 


In working with certain problems in the field of tests and measure- 
ments one may be confronted with the task of determining the relation- 
ships between a component or a sum of components of a composite 
variable, and the composite. It is often desirable to determine the 
reliability of the measures under consideration. Knowing these 
reliabilities the relationships between the measures may be corrected 
for errors of measurement. 

For example, in constructing a test consisting of a number of 
subtests it may be important to know how well a single subtest or the 
sum of a group of subtests correlates with the total, how well a subtest 
or the sum of a group of subtests correlate with the rest of the test, the 
reliabilities of the subtests, of groups of subtests, and of the whole 
test, and, finally, the true relationships between the variables under 
consideration. 

A complete set of relationships may, of course, be obtained by 
straight-forward calculation of all the coefficients involved. It is 
possible, however, considerably to lessen the labor of computation 
by judicious use of certain formulae which will here be enumerated. 
It is to be noted that these short-cuts entail no assumptions other than 


those required by the use of the data and a few basic formulae now in 
general use. 


1. UNCORRECTED COEFFICIENTS 


Let T represent the composite variable made up of a + b com- 
ponents, A a part of the composite consisting of 1 to a elements, and 
B the remaining elementsa + 1ltoa+b. Thus,7=A+B. The 
minimal number of constants required for determining the complete 
set of relationships is (1) o4, os, Tas, OF (2) oa, or, Tar. Since in any 
particular investigation it may be convenient to use one set of constants 
rather than another, formulae for each are given. 
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it Case 1.—Knowing raz, oa, and @s, find rar. 


4 Cat Ont as 
—1 





TaT = TA445 = 





Vo? + on? + Quer az 
Case 2a.—Knowing rar, oa, and o7, find raz. 

OrTaT ~ Ga 
Vor’ + o4? — Qouortar 
Case 2b.—Knowing rar, oa, and or, find rar. 








Tas = TaTr—s) = 


Or — CalaT 


Vor? + Ga? — Q2oaerlar 








Tar = T(r-a)T = 


(1) 


(2) 


(3) 


If A and B each consist of one element formulae (1), (2), and (3) 
remain unchanged. However, in the case where A consists of one 


computation. 
Or totie +t + oeyyTuc+1) 





Tar = 





Vor + + + + fay? + Qowe4ris + °° + ower nro 


In this case formulae (2) and (3) would not be altered. 


element and B consists of the sum of 2 to C + 1 elements the following 
modification of formula (1) will lessen somewhat the labor of 


(4) 


a In Case 1 when A and B each consist of more than one element 


usually o4, os, and ras are not known, but only the variances of the 
individual elements comprising A and B, and the intercorrelations 


i " between them. It is simpler to compute the variances of A and B 


from the variance of a sum formula, which involves knowledge of the 
variances of the elements and the intercorrelations between them, and 


ras from the formula 


O10 (a4-1)T 1041) + * * * + FeO (a4b)Ta(0+) 





Tas TlI · · · +a]{(a+1)+---+(a+d)] = 


O[1+ - · + +0)}F[(a+1)+ · - · 426)! 


(5) 


y than to use the general expression of 714... 4014... 40+(o+1)+.---+(e4)) 

a since the latter involves computation of or? from the variances of the 
7 elements and the intercorrelations between them. Furthermore, it 

will be seen later that if it is necessary to correct the correlations for 


‘ attenuation, using formula (1), (2), or (3) will greatly simplify the task. 


i In Cases 2a and 2b, when A and B each consist of more than one 


rae for Tar. 
Oiir + °° * + afar 





Tat = T(14.---4+a)T = 


TA 


4 element, usually only the variances and the intercorrelations between 
| the elements comprising A, and the correlations between 7 and the 
elements of A are known. Here again, it will be simpler to use the 
variance of a sum formula to compute o,?, and the following formula 


(6) 
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2. RELIABILITY COBFFICIENTS 


It has been shown! that the reliability coefficient may be thought 
of as the percentage degree of determination of the fallible variance 
by the true variance, or in the usual notation, 


Cre® _ Faw” + FBe* + 2 Al Be! AvoBeo 


or oa* + on* + 2oacnras 

Substituting for the true variances their equivalents in fallible vari- 

ances and expressing ra,z,, in terms of ras, 

— Caran + OB ap + 2oaosras (7) 
ca? + op" + 20 40 BT aB 

Hence, knowing ras, T22, and ras, One can compute rrr. It also 


follows from formula (7) that if rrr, rez, and ras, are known faa Can 
be computed. 





'rr = 








Trr 


= orrrr — OB Tae — 20 40 BT aB 


2 (8) 
CA 


Usually the reliability of single elements rather than that of the sum 
of a group of elements is known. When A or Bor both consist of more 
than one element, the reliability of both A and B must be determined 
before the reliability of the whole may be obtained. Formula (7) 
enables us, however, to ascertain the reliabilities of A and B. For 
example, let A consist of two, and B of three elements. To get the 
reliability of the A group substitute in formula (7) for ras the reliability 
of one element, for rss the reliability of the other, and for ras the 
correlation between them. rz, then, will give the reliability of part A. 
In part B, consisting of three elements, one can first determine the 
reliability of the sum of two elements by means of the procedure 
outlined above. To get the reliability of the whole part B, substitute 
in formula (7) for raa the reliability of the sum of the two elements in 
B, for res the reliability of the third element, and for r4z the correlation 
between the sum of the two elements and the third. rrr, then, will 
give the reliability of part B. Having thus obtained the reliabilities 
of parts A and B, to get the reliability of the whole test substitute the 
reliability of part A for raa, the reliability of part B for rss, and the 
correlation of parts A and B for ras. rrr, finally, gives the reliability 
of the whole test. 

It is to be noted in connection with formula (8), that when part B 
consists of several elements of known reliability, the reliability of 
part B may be obtained by means of formula (7). However, if part A 





TAA 





‘Tryon, R. C.: “The Reliability Coefficient as a Per Cent, with the Appli- 


cation to the Correlation between Abilities.” Psychol. Rev., Vol. XX XVII, 1930, 
Pp. 140-157. 
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consists of a number of elements of unknown reliability, even though 
the reliability of part A as a whole may be ascertained by means of 
formula (8), a knowledge of the reliabilities of all but one element of A 
is required for the determination of the reliability of each and every 
element in A. 


3. CORRELATIONS CORRECTED FOR ATTENUATION 


It is sometimes desired to obtain coefficients corrected for attenua- 
tion. With ras, that is, the correlation between different elements or 
groups of elements of a composite, the ordinary form of the correction 

TaB 


for attenuation, ras, = Wy may be applied. However, with 


Taal BB 

Tar OF fer, Which are correlations between an element or a group of 
elements and a composite containing it, this formula cannot be used 
since it assumes that the correlation between errors in the two vari- 
ables is zero. The writers suggest the following formulae to be used 
when A and B are either a single element or a sum of several elements. 
Case 1.—Knowing ras, ¢4, Or, Taa, ANd Tez, find rar. Correcting 
each term in formula (1) for errors of measurement, we have, 





CaTaa + OnTas 
TécTo = (9a) 
¥ Orv Taalrr 


Substituting for rrr from formula (7) and simplifying, 





OATAA + OsTas 
(9b 
V raa(oa*raa + on’? + Zoot as) 


Case 2a.—Knowing far, Tas, 4, Sx, Or, Taa, aNd rr, find ras. 
Correcting each term in formula (2) for errors of measurement, we 


have, 
OT AeToV TTT ~— CAV Taa 
Vorrrr + casa — FACT AwToV/ Taal rr 


Substituting for r4,7,, from formula (9a) and simplifying, 


TacoTo = 








TAwBo 





OsT as (10 
Vi Taa(or?rrr — oa — 20 40 BY az) 


Case 2b.—Knowing rar, fa3,0 4, 8, Or, Taa, ANd Trz, find ray,re 
Correcting each term in formula (3) for errors of measurement, we have 


OrV Trr — CAT AwToV TAA 
Vorrrr + oa*F aa — 20 AO TT Ao Too/ TaalTrT 


Substituting for r4.7,, from formula (9a) and simplifying, 


TAécBo = 











TBeTo = 





Orrrr — Ca 4a — TaOBraB (11 


Vorrrr(or rrr — O47 44 — 20 40 BT 4B) 





TBeTo = 











